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Read This First__ 


The purpose of this user’s guide is to serve as a reference book for the 
TMS320C40 and TMS320C40-40 digital signal processors. Throughout the 
book, all references to the TMS320C40 apply to the TMS320C40-40 as 
well, unless an exception is noted. This document provides information to 
assist managers and hardware/software engineers in application develop- 
ment. 


How to Use This Manual 


Chapter 1 


Chapter 2 


Chapter 3 


Chapter 4 


Chapter 5 


This document contains the vs chapters: 


Introduction 
A general description of the TMS320C40, its key features, and typical appli- 
cations. 


Architectural Overview 
Functional block diagrams. TMS320C40 secon description, hardware 
components, and device operation. Instruction set summary. 


CPU Registers, Memory, and Cache 

Description of the registers in the CPU primary register file and expansion 
register file. Memory maps. Instruction cache architecture, eo and 
control bits. 


Data Formats and Floating-Point Operation. 

Description of signed and unsigned integer and floating-point formats. Dis- 
cussion of floating-point multiplication, addition, subtraction, normalization, 
rounding, conversions, and reciprocals. 


Addressing 

Addressing types. Operation, encoding, and implementation of addressing 
modes. Format descriptions. Circular and bit-reversed addressing. System 
stack management. 
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Chapter 6 


Chapter 7 


Chapter 8 


Chapter 9 


Chapter 10 


Chapter 11 


Chapter 12 


Chapter 13 


Chapter 14 


Program Flow Control 

Software control of program flow using repeat modes, different types of 
branching, traps, interrupts, and interlocked operations. Reset operation, 
including resulting values in registers and on pins. 


External Bus Operation 

Discussion of the two 80-pin local and global memory interfaces. 
Programmable wait-states. Memory access timing. Signal group control. 
Interlocked instructions. Interrupt acknowledge timing. 


Communication Ports | 
Description of the six, bidirectional, 160-megabit-per-second (at 40-ns 
cycle time) Communication ports designed for sharing tasks between 
processors. Memory maps of the ports and their registers. Port operation 
and coordination of port activity with CPU and DMA coprocessors. 


DMA Coprocessors and ’C40 Timers 

DMA coprocessor operation. Description of coprocessor registers (channel 
control, channel address, index, transfer count, and link pointer). Use in 
unified and split mode. Priority and CPU/DMA arbitration. Autoinitialization 
and interrupts. Operation of the ’C40 timers; their registers (global control, 
timer counter, and period). 


Pipeline Operation 
Discussion of ’C40 pipeline operations. This includes pipeline conflicts and 
methods for resolving these. Clocking of memory accesses. | 


Assembly Language Instructions 


Functional listing of instructions. Condition code definitions (for conditional 


instructions such as branch conditional). Alphabetized individual instruction 
descriptions with examples. | 


Software Applications 

Software application examples for using various TMS320C40 
instruction-set and programming features. Code listings enhance 
explanations. 


Hardware Applications 

Hardware design techniques and application examples for interfacing to 
memories, peripherals, or other microcomputers/microprocessors. Code 
listings, schematics, and timing diagrams facilitate explanations. 


TMS320C4x Signal Descriptions and Electrical Characteristics 
Pin locations and pin descriptions. ‘C40 dimensions and package 
description. Electrical characteristics. Signal timing and characteristics. 
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Appendix A TMS320C40 Sockets 
Two sockets available for the TMS320C40. 


Appendix B XDS510 Design Considerations 
Considerations for designing your TMS320C40 target system for use with 
the XDS510 emulator. 
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Style and Symbol Conventions 
This document uses the following conventions: 


L} Program listings, program examples, interactive displays, file names, 
and symbol names are shown in a special font. Examples use a bold 
version of the special font for emphasis. Here is a sample program list- 


ing: 

0011 0005 0001 -field 1, 2 
0012 0005 0003 -field 3, 4 
0013 0005 0006 -field 6, 3 
0014 0006 -even 


(i In syntax descriptions, the instruction, command, or directive is ina 
bold face font and parameters are in italics. Portions of a syntax that 
are in bold face should be entered as shown; portions of a syntax that 
are in italics describe the type of information that should be entered. 
Here is an example of an instruction: 


CMPF3 src2,src3 


Note: Although the instruction mnemonic (CMPF3 in this example) is in 
capital letters, the 'C40 assembler is not case sensitive — it can 
assemble mnemonics entered in either upper or lower case. 


CMPF3 is the instruction mnemonic. This instruction has two 
parameters, indicated by src2 and src3. 
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[) Square brackets ([ and ]) identify an optional parameter. If you use an 


optional parameter, you must specify the information within the 
brackets; however, you don’t enter the brackets themselves. Here’s an 
example of an instruction that has an optional parameter: 

LDP src /[,DP] 


The LDP instruction is shown with two parameters; one is optional. The 
first parameter, src, is required. The second parameter, DP, is optional. 
As this syntax shows, if you use the optional second parameter, you 
must precede it with a comma. 


Braces ( { and } ) indicate a list. The symbol | (read as or) separates 
items within the list. Here’s an example of a list: 


a a a a 
This provides three choices: *, *+, or *-. 


Unless the list is enclosed in square brackets, you must choose one 
item from the list. 


The following is the format for a varying number of parameters. For ex- 
ample, the .byte directive can have up to 100 parameters. The syntax 
for this directive is | 


-byte value [, ... , valuep] 


This syntax shows that .byte must have at least one value parameter, 
but you have the option of supplying additional value parameters sepa- 
rated by commas. | 
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Texas Instruments’ TMS320C4x generation floating-point processors are 
designed specifically to meet the needs of parallel processing and other 
real-time embedded applications. TMS320C4x products consist of both 
parallel processing devices and development tools. With world-class 
parallel-processing development tools, designers are able to fully utilize the 
immense performance of 275 MOPS (millions of operations per second) 
and 320 Mbytes per second throughput made available by the TMS320C4x 
generation. 


This chapter provides a brief overview of the TMS320C 4x generation. Major 
topics covered are as follows: 
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1.1. The TMS320 Family 


The TMS320C4x is one of five generations in the TMS320 family of digital 
signal processors. The TMS320C1x, TMS320C2x, and TMS320C5x offer 
designers acomplete line of general-purpose and application-specific fixed- 
point DSPs. The TMS320C3x and TMS320C4x generations round out the 
TMS320 family, providing an ensemble of floating-point DSPs. The 
TMS320 family has blossomed from a single device introduced in 1982, the 
TMS32010, to nearly thirty different products across five CPU architectures. 
‘On-chip hardware multipliers, register files, barrel shifters, ALUs, ROM, 
RAM, caches, and I/O peripherals along with massive internal busing (all 
within a product as programmable as a general-purpose microprocessor), 
make Tl’s TMS320 devices ideal for the gamut of computer-intensive appli- 
Cations. | | | 


Figure 1-1. | TMS320 Family of Devices 
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1.2 Parallel Processing 


The need for parallel processing is quickly growing. As floating-point per- 
formance requirements grow exponentially, semiconductor manufacturers 
can no longer meet the need with single processing elements. Processors 
not designed for parallel processing are inadequate for the task, as interpro- 
cessor communication quickly saturates device I/O and adversely affects 
computing efficiency. Products in the TMS320C3x generation made the first 
step in addressing the need for parallel processing by providing designers 
with two external interface ports, each with a comprehensive memory inter- 
face. This yields an immense amount of I/O bandwidth. Devices in the 
TMS320C4x generation go several steps further by incorporating on-chip 
hardware to facilitate high-speed interprocessor communication and con- 
current I/O without degrading CPU performance. These features, coupled 
with a host of sophisticated parallel processing development tools, make 
the TMS320C4x generation of floating-point processors ideal for realtime 
embedded applications. 
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1.3 TMS320C4x Features 


The TMS320C4x generation consists of two equally important aspects, par- 
allel processing devices and parallel processing development tools. 


1.3.1 TMS320C40 Device Key Features 
The Primary features of the TMS320C4x devices are: 


[1 Sixcommunication ports for high speed interprocessor communication. 
Communication port key features include: 
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20-Mbytes/sec asynchronous transfer rate at each port for maxi- 
mum data throughput 


Direct (glueless) processor-to-processor communication for ease 
of use 


Bidirectional transfers for maximum communication flexibility 


.} Six-channel DMA coprocessor for concurrent I/O and CPU operation, 
thereby maximizing sustained CPU performance by alleviating the CPU 
of burdensome I/O. DMA coprocessor key features include: 


Concurrent data transfers and CPU operation for sustained CPU 
performance | 


_ Self-programming (autoinitialize) capability for each channel, 


thereby not requiring the CPU for initialization, maximizing sus- 
tained CPU performance 


Data transfers to and from anywhere in the processor’s memory 
map for maximum flexibility 


L1 High-performance DSP CPU capable of 275 MOPS and 320 Mbytes/ 
sec. CPU key features include: 


Eleven operations per cycle throughput, resulting in massive com- 
puting parallelism and sustained CPU performance 


40-ns and 50-ns instruction cycle times 


40/32-bit single-cycle floating-point/integer multiplier for high per- 
formance in computationally intensive algorithms 


Single-cycle IEEE floating-point conversion for efficient interface to 
IEEE-compatible processors 


Hardware divide and inverse square root support for high perform- 
ance 


Byte and half-word manipulation capabilities for fast data (un)pack- 
ing 
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™ Source code compatible with TMS320C3x generation for easy up- | 
ward and downward mobility 


@ Support for linear, circular, and bit-reversed addressing for high 
performance 


Mm Single-cycle branches, calls, and returns for fast program control 


m™ Single-cycle barrel shifter for 0-31 single-cycle right or left shifts for 
fast bit manipulation 


M Relocatable reset and interrupt vectors for easy integration into 
parallel processing systems 


() Two identical external data and address buses supporting shared 
memory systems and high data rate, single-cycle transfers. Key fea- 
tures include: 


M@ High port data-transfer rate of 100 Mbytes/sec 


™ 16-Gbyte continuous program/data/peripheral address space for 
maximum design flexibility 


™ Status pins that signal type of memory access requested for fast, 
intelligent bus arbitration in shared memory systems 


™ Separate address, data, and control-enable pins for high-speed 
bus arbitration 


™ Four sets of memory-control signals support different speed 
memories in hardware, enabling efficient use of low- and _high- 
speed memories 


Ll On-chip analysis module supporting efficient, state of the art parallel 
processing debug. Key features include: 


m@ Separate breakpoint comparators for program, data, and DMA ac- 
cesses, providing onchip hardware breakpoint capabilities for fast 
debug and development 


™ Discontinuity stack for hardware trace, facilitating fast debug and 
development 


m@ Event counter for accurate benchmarking and profiling 


m JTAG interface for standard system connection 
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[1 On-chip program: cache and dual-access/single-cycle RAM for in- 
creased memory access performance. On-chip memory key features 
include: | 


mM 512-byte instruction cache for increased system performance 


m@ 8k-bytes of single-cycle dual access program or data RAM for in- 
creased system performance and lower system cost 

™@ Bootloader (ROM based) supporting program bootup via 8-, 16- or 
32-bit memories over any one of the communication ports 


[) Separate internal program, data, and DMA coprocessor buses for sup- 
port of massive concurrent I/O of program and data throughput, thereby 
maximizing sustained CPU performance. 


Summed up, the total device performance is 275 MOPS and 320 Mbytes/ 
sec as noted below. 
TMS320C40 Performance 
Sustained 1/O: 
® Communication Ports 
® DMA Coprocessor 
® Global and Local Buses 


Sustained Computation: 
® DMA Coprocessor 
® High-Performance CPU 


y | 40-ns 
, ~ Cycle Time 


DATA THROUGHPUT 
Global Port 100 Mbytes/sec 
Local Port 100 Mbytes/sec 
6 Com Ports 120 Mbytes/sec 


TOTAL I/O = 320Mbytes/sec 


CPU — 8 OPS/Cycle = 200 MOPS 
e 2 Data Accesses 60 MOPS 
e 1 FP Multiply 25 MOPS 
e 1 FP ALU Operation 25 MOPS 
e 2 Addr. Register Mods 60 MOPS 
e 1 Loop Counter Update 25 MOPS 
e 1 Branch 25 MOPS 


DMA COPROCESSOR 

3 OPS/Cycle = 75 MOPS 

e 1 Data Access 25 MOPS 

e 1 Addr. Register Mods. 25 MOPS 

e 1 Transfer Counter 25 MOPS 
Update 


TOTAL MOPS = 275 MOPS 
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1.3.2 Communication Port Benefits 


Without the six communication ports, 120 Mbytes/sec of processor through- 
put must be squeezed over one or both of the external memory interfaces, 
thereby saturating processor throughput, likewise turning the system into 
a complex shared memory architecture. With the communication ports, 
bandwidth is plentiful (illustrated in Figure 1-2). 


Figure 1-2. TMS320C40 Throughput Increases Use of Communication Ports 
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1.3.3 DMA Coprocessor Benefits 


Without the DMA coprocessor, the CPU would have to use computational 
MOPS to transfer data within the processor’s memory map. With the DMA 
coprocessor, the CPU can focus its entire 200 MOPS of performance on 
quality computational tasks while the DMA coprocessor takes care of the 
burdensome I/O. This is illustrated in Figure 1-3. 


Figure 1-3. | TMS320C40 Throughput Increases Use of DMA Coprocessor 
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1.3.4 TMS320C40 Parallel Processing Development Tools Key Features 


The primary TMS320C4x development tools are as follows: 


a 


C) 


Parallel processing in-circuit emulator (XDS510) 


Able to debug both C and assembly code simultaneously using the 
graphical user-interface based source-level debugger 


Can debug any number of TMS320C4x devices in a system with a 
single XDS510 controller card 


Can globally stop, start and single step all or any combination of 
’C40s in a system. 


Parallel processing development system 


Host-independent evaluation board with four ’C40s 

Each ’C40 connected to every other ’C40 via their communication 
ports, enabling designers to efficiently test different system 
topologies 

Interfaces directly to XDS510 emulator, creating a complete 
parallel processing development environment. 


Parallel processing optimizing ANSI C compiler 


Parallel runtime support library for easy implementation of data and 
message passing between tasks (or processors) in parallel 
processing systems | 


C-—source and target-specific optimizations for dense, optimal code 
Plum-Hall validated to ANSI standard for maximum code portability 


SPOX parallel processing DSP operating system 


Parallel processing support for easy message passing within a 
multitasking environment 

Communication port, DMA coprocessor, and memory interface 
drivers for fast development of C code without detailed knowledge 
of the hardware — 

Multitasking real-time kernel for fast implementation of 
multitasking system 


DSP math library for fast development of DSP applications (using 
optimized assembly language routines) 


Parallel processing assembler/linker 


Directives to map program and data code on specific processors for 
fast integration and debug of parallel processing code 


mM Relocatable modules for maximum code flexibility 


a 
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LJ Hardware verification and full functional models 


= Simulation of multiple ’'C40’s and associated logic for accurate 
development (via software simulation) of parallel processing 
systems | 


@ Accurate simulation of device bus cycles and functional execution 
for fast development of product hardware 


™@ Supports various workstation and PC environments 
QO State accurate simulator | 


mM Provides cycle-by-cycle simulation of all aspects of the 
TMS320C04x 


m@ Low-cost way to simulate key software kernels 
m™ Supported on a host of workstation and PC platforms 
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1.4 Applications 


Below is a list of classical DSP applications along with a number of 
embedded real-time applications which need the computational 
performance offered by TMS320 devices. The real time performance, low 
device costs, and comprehensive development tools are the primary 
aspects that which make Texas Instruments TMS320 devices the preferred 
solution in the following applications: 


— 1-4. Matrix of TMS320 DSP ee 


General- -Purpose D DSP 


Digital Filtering 
Convolution 

Correlation 

Hilbert Transforms 

Fast Fourier Transforms 
Adaptive Filtering 
Windowing 

Waveform Generation 


Ny iy .NoicelSpeech °., 


Voice Mail 

Speech Vocoding 
Speech Recognition 
Speaker Verification 
Speech Enhancement 
Speech Synthesis 
Text-to-Speech 
Neural Networks 


Echo Cancellation 
ADPCM Transcoders 
Digital PBXs 

Line Repeaters 
Channel Multiplexing 
1200- to 19200-bps Modems 
Adaptive Equalizers 

DTMF Encoding/Decoding 
Data tdsetel 


My hy hy ty 


~-Graphies/imaging _ 


3. D Transformations Rendering 
Robot Vision 

Image Transmission/Compression 
Pattern Recognition 

Image Enhancement 
Homomorphic Processing 
Workstations 

eee uae 


Disk contol: 


Servo Control 
Robot Control 

Laser Printer Control 
Engine Control 

Motor Control 
Kalman Filtering 


Telecommunications . 


FAX 

Cellular Telephones 

Speaker Phones 

Digital Speech Interpolation (DS!) 

X.25 Packet Switching 

Video Conferencing 

Spread Spectrum 
Communications 
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_ Instrumentation | neg 


Spectr Analysis 
Function Generation 
Pattern Matching 
Seismic Processing 
Transient Analysis 
Digital Filtering 
Phase-Locked Loops 


Secure Conimunicaions 
Radar Processing 

Sonar Processing 

Image Processing 
Navigation 

Missile Guidance 

Radio Frequency Modems 
Sensor Fusion 


- Automotive - 


Engine Control 

Vibration Analysis 

Antiskid Brakes 

Adaptive Ride Control 

Global Positioning Navigation 
Voice Commands 

Digital Radio 

Cellular Telephones 
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Aader Detectors 

Power Tools 

Digital Audio/TV 

Music Synthesizer 

Toys and Games 

Solid-State Answering Machines 


Robotics 

Numeric Control 
Security Access — 
Power Line Monitors 
Visual Inspection 
Lathe Control 

CAM 


cana Aids 


Patient Monitoring 
Ultra Sound Equipment 
Diagnostic Tools 
Prosthetics 

Fetal Monitors 

MR Imaging 
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The TMS320C40’s high performance is achieved through the precision and 
wide dynamic range of the floating-point units, large on-chip memory, a high 
degree of parallelism, and the six-channel DMA coprocessor. Figure 2-1, 
beginning on the next page, is a block diagram of the TMS320C40. 


This chapter gives an architectural overview of the TMS320C40 processor. 
Major areas of discussion are listed below. 
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Figure 2-1. TMS320C40 Block Diagram 
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Figure 2-1. TMS320C40 Block Diagram (Concluded) 
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2.1 Central Processing Unit (CPU) 


2.1.1 


The TMS320C40 has a register-based CPU architecture. The CPU com- 
prises the following components: 


(J Floating-point/integer multiplier 


[} ALU for performing arithmetic: floating-point, integer, and logical opera- 
tions 


32-bit barrel shifter 
Internal buses (CPU1/CPU2 and REG1/REG2) 
Auxiliary register arithmetic units (ARAUs) 


COod @ 


CPU register file 


Figure 2—2 shows the various CPU components that are discussed in the 
succeeding subsections. 


Multiplier 


The multiplier performs single-cycle multiplications on 32-bit integer and 
40-bit floating-point values. The TMS320C40 implementation of float- 
ing-point arithmetic allows for floating-point operations at fixed-point 
speeds via a 40-ns instruction cycle and a high degree of parallelism. To 
gain even higher throughput, you can use parallel instructions to perform a 
multiply and ALU operation in a single cycle. 


When the multiplier performs floating-point multiplication, the inputs are 
40-bit floating-point numbers, and the result is a 40-bit floating-point num- 
ber. When the multiplier performs integer multiplication, the input data is 32 
bits and yields either the 32 most significant bits or 32 least significant bits 
of the resulting 64-bit product. Refer to Chapter 4 for detailed information 
on data formats and floating-point operation. 


2.1.2 Arithmetic Logic Unit (ALU) 


2-4 


The ALU performs single-cycle operations on 32-bit integer, 32-bit logical, 
and 40-bit floating-point data, including single-cycle integer and float- 
ing-point conversions. Results of the ALU are always maintained in 32-bit 
integer or 40-bit floating-point formats. The barrel shifter is used to shift up 
to 32 bits left or right in a single cycle. 


Internal buses, CPU1/CPU2 and REG1/REG2, carry two operands from 
memory and two operands from the register file, thus allowing parallel multi- 
plies and adds/subtracts on four integer or floating-point operands in a 
single cycle. 
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2.1.3 Auxiliary Register stalin Units (ARAUs) 


Two auxiliary register arithmetic units (ARAUO and ARAU1) can generate 
two addresses in a single cycle. The ARAUs operate in parallel with the mul- 
tiplier and ALU. They support addressing with displacements, index regis- 
ters (IRO and IR1), and circular and bit-reversed addressing. Refer to Chap- 
ter 5 for a description of addressing modes. 


2.1.4 CPU Primary Register File 


The TMS320C40 primary register file provides 32 registers in a multiport 
register file that is tightly coupled to the CPU. Table 2—1 lists register names 
and functions, followed by the section number and page of each description. 
(The expansion register file is described in subsection 2.1.5 on page 2-9.) 


All of the primary register file registers can be operated upon by the multipli- 
er and ALU, and can be used as general-purpose registers. However, the 
registers also have some special functions. For example, the 12 ex- 
tended-precision registers are especially suited for maintaining float- 


ing-point results. The eight auxiliary registers support a variety of indirect 


addressing modes and can be used as general-purpose 32-bit integer and 
logical registers. The remaining registers provide system functions such as 
addressing, stack management, processor status, interrupts, and block re- 
peat. Refer to Chapter 3 for detailed information on the CPU registers. Re- 
fer to Chapter 5 for register usage in addressing. 


The extended-precision registers (RO—R11) are capable of storing and 
supporting operations on 32-bit integer and 40-bit floating-point numbers. 
Any instruction that assumes the operands are floating-point numbers uses 
bits 39-0. If the operands are either signed or unsigned integers, only bits 
31—0 are used, and bits 39-32 remain unchanged. This is true for all shift 
operations. Refer to Chapter 4 for extended-precision register formats for 
floating-point and integer numbers. 


The 32-bit auxiliary registers (ARO—AR7) can be accessed by the CPU 
and modified by the two auxiliary register arithmetic units (ARAUs). The pri- 
mary function of the auxiliary registers is the generation of 32-bit addresses. 
They can also be used as loop counters or as 32-bit general-purpose regis- 
ters that can be modified by the multiplier and ALU. Refer to Chapter 5 for 
detailed information and examples of the use of auxiliary registers in ad- 
dressing. 
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Table 2-1. 


CPU Primary Registers 


For Further 
| Assembler Description, See: 
ao Assigned Function Name Paragraph Page 


Extended-precision register 0 
Extended- “precision register 1 
Extended-precision register 2 
Extended-precision register 3 
Extended-precision register 4 
Extended-precision register 5 
Extended-precision register 6 
Extended-precision register 7 
Extended-precision register 8 
Extended-precision register 9 
Extended-precision register 10 
Extended-precision register 11 


Auxiliary register 0 
Auxiliary register 1 
Auxiliary register 2 
Auxiliary register 3 
Auxiliary register 4 
Auxiliary register 5 
Auxiliary register 6 
Auxiliary register 7 


Data-page pointer 
Index register 0 
Index register 1 
Block-size register 
System stack pointer 


Status register 
DMA Coprocessor interrupt enable 
Internal-interrupt enable register 

IlOF flag register 


Repeat start address 
Repeat end address 
Repeat counter 


The data page pointer (DP) is a 32-bit register. The 16 LSBs of the data 
page pointer are used by the direct addressing mode as a pointer to the page 
of data being addressed. The ’C40 can address up to 64K pages, each page 
containing 64K words. The data page pointer is illustrated in Figure 5—1 
on page 5-4. 


The 32-bit index registers contain the value used by the auxiliary register 
arithmetic unit (ARAU) to compute an indexed address. Refer to Chapter 
5 for examples of the use of index registers in addressing (See subsection 
5.1.3, page 5-5, and Section 5.4, page 5-30. 


The ARAU uses the 32-bit block size register (BK) in circular addressing 
to specify the data block size. (Circular addressing is described in Section 
5.3 on page 5-25.) 
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The system stack pointer (SP) is a 32-bit register that contains the ad- 
dress of the top of the system stack. The SP always points to the last ele- 
ment pushed onto the stack. A push performs a preincrement, and a pop 
performs a postdecrement of the system stack pointer. The SP is manipu- 
lated by interrupts, traps, calls, returns, and the PUSH and POP instruc- 
tions. Refer to Section 5.5, page 5-31, for information about system stack 
management. 


The status register (ST) contains global information relating to the state 
of the CPU. Typically, operations set the condition flags of the status register 
according to whether the result is zero, negative, etc. This includes register 
load and store operations as well as arithmetic and logical functions. When 
the status register is loaded, however, a bit-for-bit replacement is performed 
with the contents of the source operand, regardless of the state of any bits 


inthe source operand. Therefore, following a load, the contents of the status 


register are identically equal to the contents of the source operand. This al- 
lows the status register to be easily saved and restored. See Table 3-2 on 
page: 3-6 for definitions of the status register bits. , 


The DMA coprocessor interrupt enable register (DIE) is a 32-bit register 
containing 2- and 3-bit fields to designate the interrupt synchronization 
scheme for each of the six DMA channels. It allows each DMA channel to 
service a corresponding input communication port and output communica- 
tion port. Also, each DMA channel can be synchronized with external inter- 
rupts or the on-chip timers. This register is described in subsection 3.1.8 
on page 3-8. 


The CPU internal interrupt enable register (IIE) is also a 32-bit register 
(described in subsection 3.1.9 on page 3-10 ). This register enables/dis- 
ables interrupts for the six communication ports, both timers, and the six 
DMA coprocessor channels. . | 


The llOF flag register (IIF) controls the function (general-purpose I/O or in- 
terrupt) of the four external pins (IIOFO to IIOF3). Interrupts can be level or 
edge triggered. Subsection 3.1.10 on page 3-12 provides further descrip- 
tion. | 


The 32-bit repeat counter (RC) register specifies the number of times a 
block of code is to be repeated when performing a block repeat. When the 
processor is operating in the repeat mode, the 32-bit repeat start address 
register (RS) contains the starting address of the block of program memory 


‘to be repeated, and the 32-bit repeat end address register (RE) contains 


the ending address of the block to be repeated. Further information is in 
subsection 3.1.11 on page 3-14. 


The program counter (PC) is a 32-bit register containing the address of the 
next instruction to be fetched. Although the PC is not part of the CPU register 
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file, it is a register that can be modified by instructions that modify the pro- 
gram flow. 


2.1.5 CPU Expansion Register File 


Besides the CPU primary register file (just covered in subsection 2.1.4, 
starting on page 2-6), the expansion register file contains two special reg- 
isters that act as pointers: 


LC) IVTP register (points to the interrupt-vector table, which is shown in 
Figure 3-8 on page 3-16), 


 =TVTP register (points to the trap vector table (TVT), which defines vec- 
tors for 512 interrupts. This is described in Figure 3—7 on page 3-15). 


These two registers are fully described in Section 3.2 on page 3-15. 
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2.2 Memory Organization 


= The total memory reach of the TMS320C40 is 4G (giga or billion) 32-bit 
words (4 Gbytes). Program memory (on-chip RAM or ROM and external 
memory) as well as registers affecting timers, communication ports, and 
DMA channels are contained within this space. This allows tables, coeffi- 
cients, program code, and data to be stored in either RAM or ROM. Thus, 
memory usage is maximized, and memory space allocated as desired. 


By manipulating one external pin (ROMEN, pin AK4), the first one-mega- 
word area of memory (0000 0000h to OOOF FFFFh) can be configured to be 
part of the local address bus or configured to address the on—chip ROM 
when using the boot loader (with remaining space reserved). (This is further 
discussed in Section 3.4 on page 3-18.) 


2.2.1 RAM, ROM, and Cache 


Figure 2—3 shows how the memory is organized on the TMS320C40. RAM 
blocks 0 and 1 are 4K bytes (1K x 32 bits) each. The ROM block is reserved 
and contains a boot loader. Each RAM and ROM block is capable of sup- 
porting two accesses in a single cycle. The separate program buses, data 
buses, and DMA buses allow for parallel program fetches, data reads and 
writes, and DMA operations. For example: the CPU can access two data 
values in one RAM block and perform an external program fetch in parallel 
with the DMA coprocessor loading another RAM block, all within a single 
cycle. 


The reserved ROM block (upper right in Figure 2—3) contains a boot loader. 
This loader supports loading of program and data at reset time. Loading is 
from 8-, 16-, or 32-bit wide memories or any one of the six communication 
ports. Section 13.2 (page 13-5) explains the boot loader in detail. 


A 128 x 32-bit instruction cache is provided to store often-repeated sections 
of code, thus greatly reducing the number of needed off-chip accesses. This 
allows for code to be stored off-chip in slower, lower-cost memories. The ex- 
ternal buses are also freed for use by the DMA, external memory fetches, 
or other devices in the system. 


For further information about the memory and instruction cache, refer to 
Section 3.4 (memory organization — page 3-18) and Section 3.5 (cache 
memory — page 3-25). 
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Figure 2-3. | Memory Organization 
Cache RAM 
(128 x 32) Block 0 
(512 bytes) (1K x 32) (1K x 32) 
(4K bytes) (4K bytes) 
D(31-0) LD(31-0) 
A(30-0) LA(30-0) 
DE LDE 
LAE 
STAT(3-0) LSTAT(3-0) 
aL OG ee”, LLOCK 
STRBx LSTRBx 
. RAWx LR/Wx 
PAGEx i LPAGEx 
RDYx — a LRDYx 
CEx LCEx 
a 
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2.2.2 Memory Maps 


Two memory maps are available as shown in Figure 2—4; the one selected 
depends upon the level at external pin ROMEN. Both maps in the figure il- 


— lustrate the 4-gigaword reach of the ’C40; however, they differ in the first 1 


megaword of memory in which: 


I 


I 


Aone at external pin ROMEN (pin AK4) causes internal ROM to be en-_ 
abled at 0000h with the one-megaword space reserved (0000 0000h 
— 0O00F FFFFh). This is shown in the right side of the figure. 

A zero at ROMEN causes addresses 0000 0000h — OOOF FFFFh to be 
accessible on the local bus. This is shown in the left side of the figure. 


The rest of the memory map is the same for either level of ROMEN: 


C 


= 
CI 
Cy 


The second megaword of memory is devoted to peripherals (as shown 
in Figure 2—5). 

The third megaword of memory contains the two 1K (4K-byte) blocks 
of RAM (BLKO and BLK1 as shown at 002F F800h — 002F FFFFh). 
The rest of the first 2 gigawords (0030 0000h — 7FFF FFFFh) is on the 
local bus (external). 

The second 2 gigawords (8000 0000h — FFFF FFFFh) are on the global 
bus (external). 


Section 3.4 (page 3-18) describes the memory maps in greater detail. Sec- 
tions 7.1, 7.2, and 7.3, beginning on page 7-3, discuss the local and global 
interfaces to these memories. The peripheral bus map and the vector loca- 
tions for reset, interrupts, and traps are also explained. 
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Figure 2-4. | Memory Maps 


S 00000 0000h 
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BP capes Phaeoe 00010 0OFFh see Dau eo) 
1M 00010 0100h 
Reserved Reserved 
0001F FFFFh 
00020 0000h ° 
Reserved 
1K RAM BLK 0 (internal) aie all | BLK 0 (Internal) 
1K RAM BLK 1 (Internal) — 0002F FCOOh BLK 4 (Internal) 
0002F FFFFh | 
| .00030 0000h 
8 | 
Cc 
® 
3S 
5 Local B Local B 
5 ocal Bus ocal Bus 
3 2G-3M (External) (External) 
a“ i 


O7FFF FFFFh 
08000 0000h 


Global Bus 
(External) 


Global Bus 
(External) 


OFFFF FFFFh 


(a) Internal ROM Disabled (b) Internal ROM Enabled 
(ROMEN = 0) | (ROMEN = 1) 
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Figure 2-5. Peripheral Memory Map 
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2.2.3 Memory Addressing Modes 


The TMS320C40 supports a base set of general-purpose instructions as 
well as arithmetic-intensive instructions that are particularly suited for digital 
signal processing and other numeric-intensive applications. Refer to Chap- 
ter 5 for detailed information on addressing. 


Four groups of addressing modes are provided on the TMS320C40 (major 
headings below). Each group uses two or more of several different address- 
ing types, as shown for each group in the following list: 


1) General addressing modes: 

m Register. The operand is a CPU register. 

m Immediate. The operand is a 16-bit immediate value. 

m@ Direct. The operand is the contents of a 32-bit address 
(concatenation of 16 bits of the data page pointer and a 16-bit 
operand). 

m Indirect. A 32-bit auxiliary register indicates the address of the 
operand. 


2) Three-operand addressing modes: 
m Register (same as for general addressing mode). 
m Indirect (same as for general addressing mode). 
m Immediate (same as for general addressing mode). 


3) Parallel addressing modes: 
m Register. The operand is an extended-precision register. 
m Indirect (same as for general addressing mode). 


4) Branch addressing modes: 
m Register (same as for general addressing mode). 
m PC-relative. A signed 16-bit displacement ora 24-bit displacement 
is added to the PC. | 
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2.3 Instruction Set Summary 


| Table 2-2 lists the TMS320C40 instruction set in alphabetical order. Each 
table entry shows the instruction mnemonic, description, and operation. Re- 
fer to Chapter 11 for a functional listing of the instructions and individual in- 
struction descriptions. 


Table 2-2. _ Instruction Set Summary 


| Mnemonic | Description Operation | 
Absolute value of a floating-point number | |src| > Rn | 


padinoger win cay Ganon) [ect +o2+c— oe +d 
paaioger Copan iets oes id 


If count 2 0: 
Arithmetic shift oe Dreg left by count) — Dreg 
Se: 


(Shifted Dreg right by |count|) —> Dreg 


If count = 0: 


Arithmetic shift (3-operand) ie src left by count) —> Dreg 


(Shifted sre right by |count|) — Dreg 


LEGEND: "3 
src general addressing modes Dreg register address (any register) 
src1 three-operand addressing modes Rn register address (RO — R11) 
srce2 three-operand addressing modes Daddr destination memory address 
Csrc conditional-branch addressing modes ARn auxiliary register n (ARO — AR7) 
Sreg register address (any register) cond condition code (see Table 11-8) 
count _ shift value (general addressing modes) ST status register 
SP stack pointer RE repeat interrupt register 
GIE global interrupt enable register RS repeat start register 
RM repeat mode bit PC program counter 


TOS top of stack Cc carry bit 


9-16 | Architectural Overview 


TMS320C40 Instruction Set 


BS Ss sa CSS Sa a A SS aS SL SS RS KS SS SSS SS SESS SECS ea ace eae area ccd 


Table 2-2. Instruction Set Summary (Continued) 


Twinemonic [Description Operation 


If cond = true: 
lf Csrcis a register, Csrc — PC 


Bcond Branch conditionally (standard) 


If Csrcis a value, Csrco+PC+1—-PC 
Else: PC + 1 — PC 


If cond is true: 
If srcis a register: 
we src > 
rial conditionally delayed and annul if If srcis a displacement: 
src + PC of branch + 3 — PC 
Else: If condis false, annul execute phase re— 
sults of next 3 instructions and continue 


If cond is true: 
lf src is a register: 
src > PC 
annul execute phase results of next 3 
Branch conditionally delayed and annul if instructions 
true If srcis a displacement: 
sre + PC of branch + 3 — PC | 
annul execute phase results of next 3 
instructions 
Else: continue 


BcondAF 


BcondAT 


If cond = true: 
lf Csrcis a register, Csrc —> PC 
lf Csrcis avalue, Csre+ PC +3—> PC 
Else: PC + 1 —> PC 


Branch unconditionally (standard) Csrc+PC+1—- PC | at 
BRD . Branch unconditionally (delayed) Csrc+PC+3— PC | | 


CALL Call subroutine PC +1— TOS 
Csrc+PC+1—- PC 


If cond = true: 
PC +1—- TOS 


BcondD Branch conditionally (delayed) 


CALLcond 


Call subroutine conditionally lf Csrcis a register, Csrc — PC 
If Csrcis a value, Csrc + PC —> PC 
Else: PC + 1 —> PC 


CMPF Compare floating-point values Set flags on Rn — src 


C floating-poi | 
CMPF3 G-operand), in POMnLNalues Set flags on src1 — src2 


CMPL_ Set flags on Dreg ~ src | 
CMPI3 Compare integers (3—operand) Set flags on src1 — src2 | | 


ARn-1— ARn 


" lf cond = true and ARn = 0: 
D 
pea pea oneal If Csrcis a register, Csrc — PC 


lf Csrcis a value, Csro+ PC +1— PC 


DBcond 


Else: PC + 1 —> PC 


2-17 


Table 2-2. 


DBcondD 


FIX 


FLOAT 


FRIEEE 


IACK — 


IDLE 


LAT cond 


LAJ 


‘| LAJcond 


LBUb 


NO 


LBb 
LDA 
LDE 


Instruction Set Summary (Continued) 


(delayed) 


| a Se, 
Decrement and branch conditionally If cond = true and ARn = 0: 


Convert from IEEE format 


Perform a dummy read with IACK = 0 
Interrupt acknowledge At end of dummy read, set IACK = 0 


ARn-1— ARn 7 


lf Csrcis a register, Csre —> PC 
lf Csrcis a value, Csrc+ PC +3— PC 
Else: PC + 1—> PC 


Convert sre from IEEE format — Dreg 


| Convert floating-point value to integer Fix (src) —> Dreg 
Convert integer to floating-point value Float(src) > Rn 


| Idle until interrupt PC + 1 — PC, then Idle until next interrupt 


Link and trap conditionally 


Link and jump | 


Link and jump conditional 


general addressing modes 
three-operand addressing modes 
three-operand addressing modes 
conditional-branch addressing modes 
register address (any register) _ 

shift value (general/addressing modes) _ 
stack pointer 

global interrupt enable register 

repeat mode bit 

top of stack 


| If condis true: . 


ST(GIE) — ST(PGIE) 
ST(CF) > ST(PCF) 
0 — ST(GIE) 
1 — ST(CF) 
PC of LAcond+ 4 — R11 
trap vector N — PC 
Else: continue 7 


PC +4—- R11 | 
PC of LAU + 3+ src— PC 


| If condis true and srcis a gegister: 


PC of LAJcond + 4 —- R11 & src— PC 
If condis true and srcis a displacement:: 
PC of LAJcond + 4 > R11, & src + PC of 
LAJcond+3+— PC 
Else, continue 


Load byte Sgn extended byte (byte 3,2,1,0) of src— Dreg 
Load byte unsigned Unsigned byte (byte 3,2,1,0) of src > Dreg 


Load floating-point exponent src(exponent) — Rn(exponent) 
Load integer from exppansion register file 
to primary register file src — Dreg 


Dreg register address (any register) 
Rn register address (RO — R11) 
Daddr destination memory address 
ARn auxiliary register n (ARO — AR7) 
cond condition code (see Table 11-8) 


ST status register 

RE repeat interrupt register 
RS repeat start register 

PC program counter 
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Table 2-2. Instruction Set Summary (Continued) 


Operation 
Load floating-point value srco— Rn 


LDFcond Load floating-point value conditionally yee ae eae 


LDFI Load floating-point value, interlocked Signal interlocked operation src —> Rn 


. 


L 


LDlcond eae 

LDII Signal interlocked operation src — Dreg 
src (mantissa) > Rn (mantissa) 

src— data page pointer 


L Load half word Sign-extended half word of src — Dreg 
| LHUw | Load half word unsigned Unsigned half word of src — Dreg 


. 


If count = 0: 


(Dreg left-shifted by count) —> Dreg 
Else: 


(Dreg right-shifted by |count|) —> Dreg 


D . 
LDHI Load 16 MSBs with 16-bit immediate src > 16 MSBs of Dreg 
LD Load integer sre — Dreg 

D 

D 


L 
L 


F 

| | 
M 

Pp 
Hw ~ 


L 


” 


Logical shift - 


1 If count 2 0: 


(src left-shifted by count) —> Dreg 
Else: 
(src right-shifted by |count|) —> Dreg 


LWLet Load word, left shifted os (0,1,2,3) anes and merged with Dreg — 
| Lwret Load word, right shifted Sg eee ee 

, 8 LSBs of src << (0,1,2,3) bytes and merged 
MBct Merge byte, left shifted with Dreg — Dreg 


LSH3 Logical shift (3-operand) 


Mot 1 6 LSBs ee ) half words and merged 
MPYF3 srci X src2 as Rn 
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Table 2-2. Instruction Set Summary (Continued) 


[winemonic [Description ——~—=«([ ~*~ eration 


) 
as 


NEGF Negate floating-point value 0—sreo— Rn 


NOP 
NORM | Nomalzetoaingpainvahe [Normalize (6c) 
OR Bitwise logical-OR 

Bitwise logical-OR (3-operand) 


Dreg OR src —> Dreg 


srcei OR src2 — Dreg 


16-bit reciprocal of src —> dst 


If cond = true or missing: 


*SP—-—-— PC 
Else: continue 


src general addressing modes Dreg register address (any register) 
srct three-operand addressing modes Rn register address (RO — R11) 
src2 three-operand addressing modes Daddr destination memory address 
Csrc conditional-branch addressing modes ARn auxiliary register n (ARO — AR7) 
Sreg register address (any register) cond condition code (see Table 11-8) 
count _ shift value (general addressing modes) ST status register 

SP stack pointer RE repeat interrupt register 

GIE global interrupt enable register RS repeat start register 

RM repeat mode bit PC program counter 

TOS top of stack C carry bit 
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ROL 


RORC 


RPTB 


RPTBD 


RPTS 


RSQRF 


SIGI 


STF 


STFI 
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CS 
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STI 
STII 


STIK. 
SUBB 


SUBB3 


SUBC 
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Instruction Set Summary (Continued) 


Rotate right through carry a rotated right 1 bit through carry —> 
reg 


src — RE 
Repeat block of instructions 1 — ST (RM) 
Next PC — RS 
If srcis an immediate value (displacement) _ 


src+PC +3 — RE 
Else: 
Repeat block delayed src— RE 


1 — ST (RM) 
PC of RPTBD +4-—> RS 


src -—> RC 

1 — ST (RM) 
Next PC — RS 
Next PC — RE 


Reciprocal of square root floating point 16-bit reciprocal of square root of src — Dreg 


Signal interlocked operation : 
Signal, interlocked Wait for interlock acknowledge 

Clear interlock 
Store floating-point value Rn — Daddr 


eee ae ee ae Rn — Daddr 
ore floating-point value, interlocked Signal end of interlocked operation 


Repeat single instruction 


Store int terlock Sreg — Daddr 
Signal end of interlocked operation | 


Store integer immediate value |s>Dreg 
Subtract integers with borrow Dreg — src— C — Dreg 
Subtract integers with borrow (3-operand) | src1 — src2 -C — Dreg 


If Dreg — sre 2 0: 


Subtract integers conditionally | [(Dreg — src) << 1] OR 1 > Dreg 
Else: Dreg << 1 — Dreg 
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able 2-2. nstruction Set Summary (Concluded) 


| Mnemonic | —————siDescription =| SC perration = 


SUBRI Subtract reverse integer src — Dreg —> Dreg 
Software interrupt Perform emulator interrupt sequence | 
TOIEEE Convert to IEEE format Convert src to IEEE format — dst 


If cond = true or missing: 
Next PC — * ++ SP 
Trap vector N —> PC 


0 — ST (GIE) 
Else: continue 


TSTB Test bit fields Dreg AND src 
TSTB3 Test bit fields (8-operand) | srci AND src2 


Bitwise exclusive-OR Dreg XOR src — Dreg 


Bitwise exclusive-OR (3-operand) src! XOR src2 — Dreg | 


TRAP cond Trap conditionaily 


LEGEND: 


src general addressing modes Dreg register address (any register) 
srct three-operand addressing modes Rn register address (RO — R11) 
srce2 three-operand addressing modes Daddr destination memory address 
Csrc conditional-branch addressing modes ARn auxiliary register n (ARO — AR7) 
Sreg register address (any register) addr 24-bit immediate address (label) 
count _ shift value (general addressing modes) cond condition code (see Table 11—8) 
SP stack pointer ST status register 

GIE global interrupt enable register RE repeat interrupt register 

RM repeat mode bit RS repeat start register 

TOS top of stack PC program counter 


Cc carry bit 
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Table 2-3. Parallel Instruction Set Summary 


| Mnemonic _ Description . 
eo 


Parallel Arithmetic With Store Instructions | 


ABSF ; |src2| —> dst 
Absolute value of a floating-point 
balls | sro3 > dst2 
1 a Absolute value of an integer |src2| —> dst 
\| src3 —> dsi2 
ADDF3 , src1 + src2 — dst1 
Add floating-point 
eee | sro3 — dst2 
oe Add integer src1 + src2 — astt 
\| src3 —> dst2 
ae Bitwise logical-AND src1 AND src2 — dst1 
\| src3 —> dsi2 


If count = 0: 
src2 << count —> dst? 


Arithmetic shift || src3 —> dst2 


ASH3 Fics: 


src2 >> |count| —> dst 
|| src3 — dst2 


FIX ; Fix(src2) —> dst? 
Convert floating-point to integer : 
lh | sro3—> dst2 
FLOAT . . Float(src2) —> dst1 : 
Convert integer to floating-point - 
dela | sro3 —> dsi2 


FRIEEE Parallel FRIEEE and STF Convert src2 from IEEE format — dst? 
|| STF in parallel with src3 —> dsi2 
LDF | src2 — dst1 
Load floating-point : 
poe | sre3 > ast2 | 
LDI : src2 — dst? 
Load integer 
cae | sro3—> dst2 


If count = 0: 
src2 << count —> dst? 


LSH3 Logical shift || src3 —> dsi2 
Else: 


| src2 >> |count| —> dst? 
|| src3 —> dst2 


LEGEND (for parallel instructions): 


srct register addr (RO — R11) src2 indirect addr (disp = 0, 1, IRO, IR1) 
src3 register addr (RO — R11) src4 indirect addr (disp = 0, 1, IRO, IR1)_ 
dst1 register addr (RO — R11) dst2 indirect addr (disp = 0, 1, IRO, IR1) 
op3 — register addr (RO or R1) op6 register addr (R2 or R3) 


op1,op2,op4,op5 — Two of these operands must be specified using register addr, and two must be specified 
using indirect. 7 
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Table 2-3. Parallel Instruction Set Summary (Continued) 


| Mnemonic | Description | Operation 


MPYF3 src1 x src2 —> dst! Fe 
Multiply floating-point and store | 
|| STF saad a || src3 — dst2 
MPYI3 seeaes sr¢e1 x src2 — dst 
Multiply integer 
ee | sro3 —> dst2 
NEGF 0— src2 —> dst 
Negate floating-point | | 
sea | sro3 —> dsiz2 
] es Convert to IEEE floating point format convert src2 to IEEE format —> dst? 
|| src3 — dsi2 


Parallel Arithmetic With Store Instructions (Concluded) 


NEGI 5 0-—src2 —> dst! 
Negate integer 
ed | sre3 > dst2 
ae srei — dst 
Complement 
a | src3 > dst 
T ai Bitwise logical-OR src1 OR src2 —> dst 
a \| src3 —> dst2 
|, Sue int | -sre1 —> dst? 
Store floating-point 
Vor | sro3 —> dst2 
Store integer src! — dst1 
lad | sro3 —> dst2 


' 
a. 


SUBF3 : src! —src2 —> dsti 
Subtract floating-point 
er | sro3 —> dst2 
SUBI3 src! —src2 —> dst? 
Subtract integer 
saa | sre3 > dsi2 
1 ae Bitwise exclusive-OR srt XOR sre2 —> dsti 
\| src3 —> dst2 


LEGEND (for parallel instructions): 


srct register addr (RO — R11) src2 indirectaddr (disp = 0, 1, IRO, IR1) 
src3 register addr (RO — R11) : src4 indirect addr (disp = 0, 1, IRO, IR1) 
dst1 register addr (RO — R11) dst2 indirect addr (disp = 0, 1, IRO, IR1) 
op3 register addr (RO or R1) op6 register addr (R2 or R3) © 


op1,op2,op4,op5— Two of these operands must be specified using register addr, and two must be specified 
using indirect. | 
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Table 2-3. Parallel Instruction Set Summary (Concluded) 


Parallel Load Instructions 


LDF : src2 — dst1 
Load floating-point 

ee | sro4 > dst2 

LDF Load floating point and store floating src2 —> dst1 

|| STF point | src3 > dst2 

LDI . src2 — dst1 

Load integer 
dha | srot —> aist2 


If count = 0: 


src2 << count — dst? 
Flse: 


src2 >> |count| —> dst1 
| src3 — dst2 


| aa ? Logical shift 3 and store integer src2 — dst : 
| | || src3 —> dst2 


| Parallel Multiply And Add/Subtract Instructions 
op1 x op2 — ops 


are 
8 


LSH3 


|| STI Logical shift, 3 operand, and store integer 


LEGEND (for parallel instructions): 
src register addr (RO — R11) src2 indirectaddr (disp = 0, 1, !RO,1IR1) ~ 
src3 register addr (RO — R11) src4 indirect addr (disp = 0, 1, IRO, IR1) 
dst1 register addr (RO — R11) dst2 indirect addr (disp = 0, 1, IRO, IR1) 
op3 — register addr (RO or R1) op6 register addr (R2 or R3) 
op1,op2,op4,op5 — Two of these operands must be specified using register addr, and two must be specified 
using indirect. 
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2.4 Internal Bus Operation — 
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A large portion of the TMS320C40’s high performance is due to internal bus- 
ing and parallelism. Separate buses allow for parallel program fetches, data 
accesses, and DMA accesses: 

[} program buses PADDR and PDATA 

L} data buses DADDR1, DADDR2, and DDATA 

[Ci DMA buses DMAADDR and DMADATA 


These buses connect all of the physical spaces (on-chip memory, off-chip 
memory, and on-chip peripherals) supported by the TMS320C40. 
Figure 2—3 shows these internal buses and their connection to on-chip and 
off-chip memory blocks. 


The program counter (PC) is connected to the 32-bit program address bus 
(PADDR). The instruction register (IR) is connected to the 32-bit program 
data bus (PDATA). These buses can fetch a single instruction word every 
machine cycle. 


The 32-bit data address buses (DADDR1 and DADDR2) and the 32-bit data 
data bus (DDATA) support two data memory accesses every machine cycle. — 
The DDATA bus carries data to the CPU over the CPU1 and CPU2 buses. 
The CPU1 and CPU2 buses can carry two data memory operands to the 
multiplier, ALU, and register file every machine cycle. Also internal to the 
CPU are register buses REG1 and REG2, which can carry two data values 
from the register file to the multiplier and ALU every machine cycle. — 
Figure 2—2 shows the buses internal to the CPU section of the processor. 


The DMA controller is supported with a 32-bit address bus (DMAADDR) and 
a 32-bit data bus (DMADATA). These buses allow the DMA to perform 
memory accesses in parallel with the memory accesses occurring from the 
data and program buses. 


Architectural Overview 


2.5 External Bus Operation 


The TMS320C40 provides two identical external interfaces: the global 
memory interface and the local memory interface. Each consists of a 32-bit 
data bus, a 31-bit address bus, and two sets of control signals. Both buses 
can be used to address external program/data memory or I/O space. The 
buses also have external RDY signals for wait-state generation with wait 
states inserted under software control. Chapter 7 covers external bus oper- 
ation. 


2.5.1 Interrupts 


The TMS320C40 supports four external interrupts (IIOF3—0), a number of 
internal interrupts, anonmaskable, external NMI interrupt, and a nonmask- 
able external RESET signal, which sets the processor to a known state. The 
DMA and communication ports have their own internal interrupts. When the 
CPU responds to the interrupt, the IACK pin can be used to signal an exter- 
nal interrupt acknowledge. Section 6.7 (beginning on page 6-23) covers 
RESET and interrupt processing. i | | | 


2.5.2 Interlocked Instructions 


In order for multiple processors to access global memory and share data in 
a coherent manner, arbitration is necessary. This arbitration ( handshaking) 
is the purpose of the TMS320C40’s interlocked operations, handled 
through the Interlocked instructions (explained in Section 6.4 on page 6-11). 
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2.6 Peripherals 


| All TMS320C40 peripherals are controlled through memory-mapped regis- 
ters on a dedicated peripheral bus. This peripheral bus is composed of a 
32-bit data bus and a 32-bit address bus. This peripheral bus permits 
straightforward communication to the peripherals. The TMS320C40 periph- 
erals include two timers and two serial ports. Figure 2-6 shows the periph- 
erals with associated buses and signals. 


Figure 2-6. Peripheral Modules 
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2.6.1 Communication Ports 


Six high-speed communication ports provide rapid processor-to-processor 
communication through each port’s dedicated communication interfaces. 
Coupled with the ’C40’s two memory interfaces (global and local), this al- 

lows you to construct a parallel processor system that attains optimum sys- 

tem performance by the distributing of tasks among several processors. 

Each ’C40 can pass the results of its work to another, enabling each ’C40 

to continue working. Chapter 8 explains communication port operation in 

detail. 


Communication port features: 

[1 160-megabit per second (20-Mbytes or 5-Mwords per second) 
bidirectional data transfer operations (at 40-ns cycle time) 

direct (glueless) processor-to-processor communication via eight aaa 
lines and four control lines 

buffering of all data transfers, both input and output 

automatic arbitration provided to ensure communication synchroniza- 
tion 

synchronization between the CPU or direct-memory access (DMA) 
coprocessor and the six communication ports via internal interrupts and 
internal ready signals. 


2. 6.2 Direct Memory Access (DMA) 


The six channels of the on- chip Direct Memory Access (DMA) coprocessor 
can read from or write to any location in the memory map without interfering 
with the operation of the CPU. This allows interfacing to slow external me- 
mories and peripherals without reducing throughput to the CPU. The DMA 
coprocessor contains its own address generators, source and destination 
registers, and transfer counter. Dedicated DMA address and data buses al- 
low for minimization of conflicts between the CPU and the DMA coproces- 
sor. A DMA operation consists of a block or single-word transfer to or from 
memory. A key feature of the DMA coprocessor is its ability to automatically 
reinitialize each channel following adatatransfer. Refer to Chapter 9 for de- 
tailed information on the DMA coprocessor. 


2.6.3 Timers 


The two timer modules are general-purpose 32-bit timer/event counters 
with two signaling modes and internal or external clocking. They can signal 
internally to the ’C40 or externally to the outside world at specified intervals, 
orthey can count external events. Each timer has anl/O pin that can be used 
as an input clock to the timer, as an output signal driven by the timer, or as 
a general-purpose I/O pin. Timers are described in detail in Section 9.10 on 
page 9-45. 
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Chapter 3 


CPU Registers, Memory, and Cache 
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The CPU primary register file contains 32 registers that can be used as 
operands by the multiplier and ALU (arithmetic logic unit). The register file 
includes the auxiliary registers, extended-precision registers, and index 
registers. These registers support addressing, floating-point/integer opera- 
tions, stack management, processor status, block repeats, branching, and 
interrupts. 


The CPU expansion register file contains two registers — the interrupt 
vector table pointer (IVTP) and the trap vector table pointer (TVTP). 


The TMS320C40 accesses a total memory space of 4G (giga = 1 billion) 
32-bit words (16 gigabytes) of program, data, and I/O space. Two internal 
RAM blocks of 1K x 32 bits each (4K bytes) and an internal ROM block con- 
taining a boot loader permit two accesses per block in a single cycle. 


A128 x 32-bit instruction cache stores often-repeated sections of code. The 
cache greatly reduces the number of off-chip accesses, allowing code to be 
stored off-chip in slower, lower-cost memories without degrading perform- 
ance. The cache also speeds data fetches to the same physical space as 
the program by not burdening the bus with program instruction fetches. 
Three bits in the CPU status register control the clear, enable, or freeze of 
the cache. 


This chapter describes in detail each of the CPU registers, the memory 
maps, and the instruction cache. Major topics are as follows: 


Section Page 
3.1 CPU Primary Register File .............. cece eee eee ees 3-3 
mM Extended-Precision Registers (RO-R11).............. 3-4 
m Auxiliary Registers (ARO—-AR7) .............. cee eee 3-5 
Mm Data-Page: Pointer (DP): 2sac40s0eieweieseti sien ee 3-5 
mM index Registers (IRO, IR1) <2 2..2.cccee tn eweree eens 3-5 
M Block-Size Register (BK) .............0ceeee cece eee 3-5 
m System Stack Pointer (SP) ..........ccceeeee ener ee 3-5 


Section Page 
M -Siais Register (SU) ssacasawecee aver vdeadee escent’ 3-5 
™@ DMA Interrupt Enable Register (DIE)................. 3-8 
m@ Internal Interrupt Enable Register (IIE) ............... 3-10 
m@ Interrupt Flag Register (IIF) Controls External Pins 
NOF(3 —0),Timer/DMA Flags .............0 eee eeee . 8-12 
m@ Block-Repeat (RS, RE) and 
Repeat-Count (RC) Registers ................0. eee 3-14 
mM Program Counter (PC) ............. ccc cece eee ees 3-14 
m@ Reserved Bits and Compatibility .. tea inacnuaveda nea 3-14 
3.2 CPU Expansion Register File ............ 0.0... cc eee eee 3-15 
m CPU Expansion Registers .............. 0. cece eee ees 3-15 
mM Trap Vector Table (TVT) ........... cece eee ee eens 3-15 
3.3 Reset Vector Mapping ............ 0... cee ec eee ees 3-17 
SH MEMOWY™ axtcara tracted casita peta aetna ee 3-18 
M .MeEMOM Maps s2cccueedecoeter teas en ator sans 3-19 
mM Peripheral Bus Memory Map ...............0e0e eee 3-20 
3.5 Instruction Cache Architecture .............. 00. e eee 3-25 
M Cache Algorithm ........ 0... . ccc cc eee eee ees 3-27 
M Cache and System Memory ............... 000 cee ees 3-28 
mM Cache Control Bits ........... 0... eee eee 3-29 
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3.1 CPU Primary Register File 


Table 3-1. 


The TMS320C40 provides 32 registers in a multiport register file that is tight- 
ly coupled to the CPU. The PC (program counter) is not included in the 
32 registers. The registers’ names and assigned function are listed in 
Table 3-1. 


All of these registers can be used as operands by the multiplier and ALU, 3 
and can be used as general-purpose 32-bit registers. However, the regis- 
ters also have some special functions for which they are particularly appro- 
priate. For example, the 12 extended-precision registers are especially 


CPU Primary Register File 


Register 
Machine See On 
Value (hex) Assigned Function Name Paragraph Page 
00 


Extended-precision register 0 
Extended-precision register 1 
Extended-precision register 2 
Extended-precision register 3 
Extended-precision register 4 
Extended-precision register 5 
Extended-precision register 6 
Extended-precision register 7 
Extended-precision register 8 
Extended-precision register 9 
Extended-precision register 10 
Extended-precision register 11 


Ww 
oi, 
ak 
Ww 
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Auxiliary register 0 
Auxiliary register 1 
Auxiliary register 2 
Auxiliary register 3 
Auxiliary register 4 
Auxiliary register 5 
Auxiliary register 6 
Auxiliary register 7 


© 


Data-page pointer 
Index register 0 
Index register 1 
Block-size register 
System stack pointer 


Status register 

DMA coprocessor interrupt enable 
Internal-interrupt enable register 

IIOF jlag register (IIOF3-0, timers, DMA) 


Repeat start address 
Repeat end address 
Repeat counter 
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well suited for maintaining extended-precision floating-point results. The 
eight auxiliary registers support a variety of indirect addressing modes and 
can be used as general-purpose 32-bit integer and logical registers. The re- 
maining registers provide system functions such as addressing, stack man- 
agement, processor status, interrupts, and block repeat. Refer to Chapter 
5 for detailed information and examples of the use of CPU registers in ad- 
dressing. 


3.1.1 Extended-Precision Registers (RO-R11) 


Figure 3-1. 


Figure 3-2. 


The 12 extended-precision registers (RO—R11) can store and support oper- 
ations on 32-bit integer and 40-bit floating-point numbers. These registers 
consist of two separate and distinct regions: 


(I bits 39-32: dedicated to storage of the exponent (e) of the floating-point 
number. 


Li bits 31-0: store the mantissa of the floating-point number: 
Mbit 31: sign bit (s), 
M@ bits 30-0: the fraction (f). 


Any instruction that assumes the operands are floating-point numbers uses 
bits 39-0. Figure 3—1 illustrates the storage of 40-bit floating-point numbers 
in the extended-precision registers. 


Extended-Precision Register Floating-Point Format 
= ac | eee 


(eee = mantisSan + 


For integer operations, bits 31—0 of the extended-precision registers contain 
the integer (signed or unsigned). Any instruction that assumes the operands 
are either signed or unsigned integers uses only bits 31—0. Bits 39-32 re- 
main unchanged. This is true for all shift operations. The storage of 32-bit 
integers in the extended-precision registers is shown in Figure 3-2. 


Extended-Precision Register Integer Format 
Ea ea el | es fees 
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Auxiliary Registers (ARO—AR7) 


The eight 32-bit auxiliary registers (ARO—AR7) can be accessed by the CPU 
and modified by the two auxiliary register arithmetic units (ARAUs). The pri- 
mary function of the auxiliary registers is the generation of 32-bit addresses. 
However, they can also operate as loop counters in indirect addressing or 
as 32-bit general-purpose registers that can be modified by the multiplier 
and ALU. Refer to Chapter 5 for detailed information and examples of the 

use of auxiliary registers in addressing. | 


Data-Page Pointer (DP) 


The data-page pointer (DP) is a 32-bit register whose 16 LSBs are used 
by the direct addressing mode as a pointer to the page of data being ad- 
dressed. Data pages are 64K words long with a total of 64K (65,536) pages. 
Bits 31-16 are reserved; they are always read as zeroes and should not 
be modified by writing to the register. The DP can be loaded by using 
the LDP pseudo-instruction or the LDI instruction. Figure 5—1 on page 5-4 
describes this register’s function. 


Index Registers (IRO, IR1) 


The 32-bit index registers (IRO and IR1) are used by the auxiliary register 
arithmetic unit (ARAU) for indexing the address. IRO is also used for bit-rev- 
ersed addressing. Refer to Chapter 5 for detailed information and examples 
of the use of index registers in addressing. (Subsection 5.1.3 on page 5-5 
covers use of the IR in indirect addressing; see the examples starting on 
page 5-12. Section 5.4 on page 5-30 describes using it with bit-reversed ad- 
dressing). 


Block-Size Register (BK) 
The 32-bit block-size register (BK) is used by the ARAU in circular address- 
ing to specify the data block size (see Section 5.3 on page 5-25). 
System Stack Pointer (SP) 


The system stack pointer (SP) is a 32-bit register that contains the address 
of the top of the system stack. The SP always points to the last element 
pushed onto the stack. The SP is manipulated by interrupts, traps, calls, re- 
turns, and the PUSH, PUSHF, POP, and POPF instructions. Pushes and 
pops of the stack perform preincrement and postdecrement, respectively, 
on all 32 bits of the SP. Refer to Section 5.5 on page 5-31 for information 
about system stack management. 


Status Register (ST) 


The status register (ST) contains global information relating to the CPU 
state. Typically, operations set the condition flags of the status register ac- 
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cording to whether the result is zero, negative, etc. This includes register 
load and store operations as well as arithmetic and logical functions. How- 
ever, when the ST is loaded, the contents of the load instruction’s source 
operand replace the ST current contents bit for bit, regardless of the state 
of any bit(s) inthe source operand. Therefore, following an ST load, the con- 
tents of the ST are identical to the contents of the source operand. This al- 
lows the status register to be saved easily and restored. At system reset, 
0 is written to this register. 


The format of the status register is shown in Figure 3-3. Table 3—2 defines 
the status register bits, their names, and functions. 


Figure 3-3. Status Register 


21 20 19 18 17 


RW RW RW RW RW RW RW RW 
NOTE: xx = reserved bit. 
R = read, W = write. 


Table 3-2. Status Register Bits Summary 


Carry condition flag 


Overflow condition flag 


: 
TN [Negative condiionfag———SSSSCSCS~—~—S~S 


Overflow mode flag. This flag affects only integer operations. 


If OVM = 0, the overflow mode is turned off; integer results that overflow are 
treated in no special way. | 


If OVM = 1, 


a) integer results overflowing in the positive direction are set to the 
most positive 32-bit twos-complement number (7FFF FFFFh). | 


b) integer results overflowing in the negative direction are set to the 
most negative 32-bit twos-complement number (8000 0000h). 


Note that the functions of bits V and LV are independent of the setting of OVM. 


Repeat mode flag. If RM = 1, the PC is being modified in either the repeat- 
block or repeat-single mode. 


T The seven condition flags (ST bits 0 — 6) are defined in Section 11.2 on page 11-10. 
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Table 3-2. 


10 


ie) 


Go 


ok, a=, awk. 
eneh, 


wal, 
BE 


15 
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Status Register Bits Summary (Continued) 


Bit Field 
Name 


Previous state of bit CF. When a trap executes or an interrupt is taken, bit CF is 
set to 1. When this occurs, the PCF bit is set to the CF bit’s value before the trap 
or interrupt. Note that the RETI and RETID instructions copy PCF to the CF bit. 


Cache freeze. Set CF = 1 to freeze cache (cache is not updated) including LRU 
(least recently used) stack manipulation. If the cache is enabled (CE = 1), fetches 
from the cache are allowed, but modification of the cache contents is not allowed. 
Cache clearing (CC=1) is allowed. At reset, this bit is set to zero. When CF=0, 
cache clearing (CC=1) is allowed. CF is set to one when a trap or interrupt is tak- 
en. Also, the RETI and RETID instructions copy PCF to the CF bit. 


Cache enable. Set CE = 1 to enable the cache, allowing the cache to be used 
according to the LRU (least recently used) cache algorithm. Set CE =0 to disable 
the cache; preventing cache updates or modifications (thus, no cache fetches 
can be made). At reset, 0 is written to this bit. Cache clearing (CC = 1) is allowed 
when CE=0. The following describe the combination of the CE and CF bits: 


CE CF _ Effect 
0 0 Cache not enabled 
0 1 Cache not enabled 
1 0 Cache enabled and not frozen 


1 


Cache clear. CC =1 invalidates all entries in the cache (contents not guaranteed, 
“garbage”). This bit is always cleared after it is written to and thus always read 
as 0. At reset, 0 is written to this bit. All cache P flags = 0 when cache is cleared. 


Cache enabled but frozen (cache read only) 


QO 
© 


m Tl 


Global interrupt enable. If GIE = 1, the CPU responds to an enabled interrupt. If 
GIE =0, the CPU does not respond to an enabled interrupt (when atrap executes 
or an interrupt is taken, bit GIE is set to 0). This bit does not affect interrupts on 
the NMI pin. The IDLE, LAT, RETI, RETID, and TRAP instructions affect this bit's 
value. | 


Previous state of bit GIE. When a trap executes or an interrupt is taken, bit 
GIE is set to 0. When this occurs, the PGIE bit is set to the GIE bit’s value 
before the trap or interrupt. Note that the RETIcond and RETIcondD instruc- 
tions copy PGIE to the GIE bit. At reset, this bit is set to 0. 


This bit determines how condition flags (ST bits 0 — 6) are set: 
lf SET COND = 0, condition-flags are set if the operation’s 
target is any extended-precision register (RO — R11) com- 
patible with the TMS320C30. This bit is set to 0 at reset. 
lf SET COND = 1, condition flags are set if the target of the 
operation is any register in the primary register files except 
the status register. 
Condition flags are always set when a CMPF, CMPL, CMPF3, CMPI3, TSTB, 
or TSTB3 instruction is executed. 


ANALYSIS | In analysis mode — state information for emulation. Read only. 


Value undefined. Read only. Reserved for an identification value. This value is 


SET COND 


set by Texas Instruments (e.g., to identify device types and revisions). 
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3.1.8 DMA Coprocessor Interrupt Enable Register (DIE) 


Figure 3-4. 
31 


The 32-bit DMA interrupt enable register (DIE), shown in Figure 3-4, is 
broken into six subfields that determine which interrupts can be used to con- 
trol the synchronization for each of the six DMA coprocessor channels. At 
reset, all zeroes are written to the register. 


DMA Interrupt Enable Register Bit Functions 
29 28 26 25 23 22 20 


DMAS5 WRITE DMA5 READ DMA4 WRITE DMA4 READ 


RW RW 


19 


R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W 
17 16 14 13 11 10 8 


DMA3 WRITE DMA3 READ DMA2 WRITE DMA2 READ 


R/W R/W 


7 


RW RAW RW) RW) RW.) RW.) R= RCW RW oR 
5 4 3 2 1 0 


DMA1 WRITE | DMA1 READ | DMAO WRITE | DMAO READ 


R/W R/W 


R/W R/W R/W. R/W R/W R/W 


R= Read W=Write 


Table 3-3 summarizes the interrupt activity for each of the four possible 
combinations of two-bit values in DMAO and DMA1 (bottom of Figure 3-4). 


Likewise, Table 3-4 (page 3-9) summarizes the interrupts enabled by 
three-bit values in DMA2 through DMAS5. 


Note: DMA Coprocessor Uses Signals to Synchronize 


The interrupts in Table 3-3 and Table 3-4 (ICRDYx, OCRDYx, TIMO, 
etc.) are not vectored. The DMA uses these as signals to synchronize 


DMA coprocessor transfers. This is explained in Section 9.9 on page 
9-40. 


ea Ce eT eT ee re ENS a Se ee a aC rT =e ES EEN EE Ee EE 
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CPU Register File — DMA Interrupt Enable Register (DIE) 


Table 3-3. | DMA Channels 0 and 1 Synchronization Interrupts (DMA0 and DMA7) 


Interrupt Enabled at DMAO or DMA1 
yeh ouaa DMAo | DMAO | DMA1 
Read Write Read 
ICRDYO | OCRDYO | ICRDY1 | OCRDY1 | From communication port 


Timo | From timer TIMG 


This interrupt synchronization scheme allows each DMA channel to service 
a corresponding input communication port and output communication port. 
Also, each DMA channel can be synchronized with external interrupts and 
the on-chip timers. 


Bit Value 


Interrupt Source for 
DMA Synchronization 


_ Table 3-4. —§ DMA Channels 2 to 5 Synchronization Interrupts (DMA2 to DMA)5 


Bit Value Interrupt Enabled at DMA2—DMA5t Interrupt Source for | 
(in DMA2'to DMA5 TDMAx Read TDMA x Write DMA Synchronization 
TICRDYx TOCRDYx From communication port | 
TOF TOF 


= 
O 
a 
9) 


HOF 1 From external pins 
TOF2 INTO — INT3 


IlOFO 
HOF 1 
IOF2 


Tt The x in DMAx is the DMA channel number, which is also the number for the corresponding ICRDYx and 
OCRDY xinterrupts. For example, an 0019 in both DMA2 READ and DMA5 WRITE would enable interrupts 
ICRDY2 and OCRDY5S, respectively. All other viable bit values (0109 to 1119) are the same (as shown in the 
table) for DMA2 through DMAS. 


4/H4/= 
=| =10 
i 


= } From ti TIMO and TIM1 
TIMi rom timers an 


Note that each DMA channel looks not only at the DMA synchronous inter- 
rupts selected but also at the synchronization mode that the channel is cur- 
rently using (see Table 9-4 on page 9-15). The synchronization mode is 
specified by the DMA channel control registers located in the DMA 
coprocessor. : 


CPU Register File — CPU Internal Interrupt. Enable Register (HE) 
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3.1.9 CPU Internal Interrupt Enable Register (IIE) 


The 32-bit internal interrupt enable register, shown in Figure 3-5, enables/ 
disables the following interrupts for the CPU: 


) Timers 0 and 1, 
[> For communication ports 0-5: 
@ = Input-buffer full, 
m Input-buffer ready, 
m@ Output-buffer ready, 
@ Output-buffer empty . 
L} DMA coprocessor channels 0-5. 


Figure 3—5 shows the IIE register bits, and Table 3—5 describes the interrupt 
enabled, depending on the bit value. A 1 read means the corresponding in- 
terrupt is enabled; a 0 indicates disabled. At reset, all zeroes are written to 
the register. 
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Figure 3-5. __ Internal Interrupt Enable Register (IIE) 
31 30 29 28 27 26 25 
pee [Re [a [|] 
INTS5 | INT4 | INT3 | INT2 | INT1 INTO 
R/W R/W R/W R/W R/W R/W R/W 
24 23 22 21 20 19 18 17 
ie Re [ke [te [er] es [| 
EMPTY5 | RDY5 | RDY5 | FULL5 || EMPTY4 |) RDY4 | RDY4 | FULL4 
R/W R/W R/W R/W R/W R/W R/W R/W 
16 15 14 13 12 11 10 9 
[aes] Ws [Ss [Ris [ee] Be | Bs [Bt 
EMPTY3 | RDY3 | RDY3 | FULL3 |} EMPTY2| RDY2 | RDY2 | FULL2 
R/W R/W R/W R/W R/W R/W R/W R/W 


8 7 6 5 4 | 3 2 1 0 
EOC- EOC- EIC- EIC- EOC- EOC- EIC- EIC-— ETINTO 
EMPTY1 | RDY‘1 RDY1 FULL1 |] EMPTYO | RDYO RDYO | FULLO 
R/W R/W R/W R/W R/W R/W R/W R/W R/W 


R=Read, W=Write, R/W =ReadMWrite 


Table 3-5. — Summary of Interrupt Enable Register Bits (IIE) 


[We BiNumbes xe 
liE Bit Field Name. 07 2 3 4 #5 oe Enables/Disables (note 1) - 
EICFULL<x (Note 2) 13 | 21 | Comm. port x input-buffer full interrupt — 
EICRDYx (Note 2) Comm. port x input-buffer ready interrupt 


1 
5 
6 
7 a 
[EOCEMPTY= (Note “16 
NTO tO S*iTimer inter 


NOTES: 1 The x represents a corresponding communication port number (0 — 5) or DMA coprocessor 
chanel number (0 —5). For example, ones in bits 5 and 25 enable interrupts for (a) input-buffer 
er i aaah port 1 and for (b) DMA coprocessor channel 0. (A 1 enables the interrupt; 
a 0 disables it. 


2. Communication port bits are shaded according to communication port number. For example, 
communication port 0’s bit numbers are in the first group of vertical shading. Thus, communic- 
ation port 0’s bits are 1, 2, 3, 4; communication-port 1’s bits are 5, 6, 7, 8; etc. The DMA 
coprocessor channel interrupts are shown the same way (e.g., EDMAINTO at bit 25, 
EDMAINT1 at bit 26, etc.). 
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Figure 3-6. 
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lIOF Flag Register (IIF) Controls External Pins IIOF(3 — 0), 

Timer/DMA Flags | 

The IIF register controls the externalinterrupt pins IIOF(3 — 0). Use itto spec- 

ify: | 

[} which IIOF pins are used for general-purpose I/O and which are used 
for interrupts, | 

[C) whether a general-purpose pin is input (read only) or output (read/ 
write), 

.) whether an interrupt pin is for edge-triggered or level-triggered inter- 
rupts, | 

CL} if an interrupt is enabled or disabled. 


Figure 3-6 depicts the IIF register bits. Table 3-6 (page 3-13) explains 
these bits in detail. Interrupt traps are shownin Figure 3—7 (page 3-15). In- 
terrupts are further explained in Section 6.7 on page 6-23. 


Interrupt Flag Register (IIF) 


31 30 29 28 27 26 25 24 
[Tint [Owanrs [oManTs [OMAINTS [DMANT2 [OMANT: [OMAINTO | TINTO} 
R/W R/W R/W R/W R/W R/W R/W R/W 
23 22 21 20 19 18 17 16 
Lo [= [| « | « | « | « | « | ww | 
R R R R R R R R 
15 14 13 12 11 10 9 8 
[eliors | Flacs | Types | FUNCS [| ElOFZ | FLace | TYPED | FUNC? || 
R/W R/W R/W R/W R/W R/W R/W R/W 
7 6 5 4 3 2 1 0 
[enor [FLAGt | TYPEY [ FUNCT [| E10FO | FLAGO | TYPEO | FUNCO J 
R/W R/W R/W R/W R/W R/W R/W R/W 


R = Read (only), R/W = Read/Write, xx = Reserved, read as 0 
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Table 3-6. —_IIF Register Bits Summary 
| Bk rile | BE Bit Nos. cca 
Name 


Function (Note 1) 


0 1 2 3 eo 
FUNC. bo Mode of pin IIOF x: 
(note 2) : lf FUNCx = 0, pin IIOFx is a general-purpose I/O (R/W) pin. 
If FUNC x = 1, pin IIOFx is an interrupt (R) pin. 


TYPEx = 0 makes JIOFx an edge-triggered latched interrupt, 
TYPEx = 1 makes IIOFx a level-triggered unlatched interrupt. 


Type of function for pin OF x: 
If pin IIOFx is a general-purpose I/O pin (FUNC x = 0): 
TYPEx : TYPEx = 0 makes IIOFx an input pin. 
(note 2) TYPEx = 1 makes IIOFx an output pin 
If pin OF x is an interrupt pin (FUNCx = 1): 
‘ se Flag for pin I|OF x: 
: If pin IIOFx is a general-purpose input pin (FUNCx = 0, TYPEx = 0), 
FLAGx = the value of pin IIOFx and is read only. 
If pin I|OFx is a general-purpose output pin (FUNCx = 0, TYPEx = 1), 
FLAGx : FLAG x = the value on pin IIOFx and is R/W. 
(note 2) E If pin HOFx is an interrupt pin (FUNCx = 1): 
FLAGx = 0 if interrupt is not asserted. 


FLAGx = 1 if interrupt is asserted. 

If O (zero) is written to FLAGx, the corresponding interrupt is 
cleared unless an interrupt is on the same pin; in that case, 
the interrupt will be set. 


Disable/enable external interrupt: 
EIIOFx = 0 disables external interrupts at pin IIFOx. 
EI|OFx = 1 enables external interrupts at pin IIFOx. 


EIIOFx 


(note 2) 

Nonmaskable Interrupt flag (NMI). The NMI interrupt (on the external NMI pin) 
behaves like other interrupts, except it cannot be masked (disabled) by the GIE 
bit (ST bit 13) or by writing to the NMI bit itself. It is temporarily masked during 
delayed branches and multicycle CPU operations. At reset, this bit is cleared. 
An asserted interrupt is cleared only by servicing the interrupt. NMI is a negati- 
ve-going, edge-triggered, latched interrupt. It is read only. 

Reading NMI as 0 indicates the interrupt is not asserted. 


| = —— NMI as 1 indicates the interrupt is asserted. 


[Reseed | 17 28 


Timer meme flags 0 and lc 
Reading TINTx as 0 indicates the timer interrupt is not asserted. 

TINTO 24 Reading TINTx as 1 indicates the timer interrupt is asserted. 

TINT1 31 A zero written to this bit clears the interrupt unless the interrupt is 
asserted at the same time; in that case, the interrupt will be shown 
as asserted. 

Interrupt flag for DMA coprocessor channels 0 to 5. 
Reading DMAINTx as 0 indicates the channel interrupt is not asserted. 
DMAINTx | 25 —30 Reading DMAINTx as 1 indicates the channel interrupt is asserted. 
A zero written to this bit clears the interrupt unless the interrupt is 
asserted at the same time; in that case, the interrupt will be 
shown as asserted. | | 


NOTES: i The xrepresents the corresponding IIOF interrupt pin (IIOF3—IIOFO). R = Read, /W = Read/Write 
. Shading organizes each communication port's bits the same as shown for the IIE register 
inTable 3—5 (see note 2) on page 3-11. For example, bits 0, 1, 2, 3 apply to pin IIOFO; bits 4, 5, 
6, 7 apply to IIOF1, etc. 
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3.1.11 Block-Repeat (RS, RE) and Repeat-Count (RC) Registers 


The 32-bit repeat start address register (RS) contains the starting address 
of the block of program memory to be repeated when operating in the repeat 
mode. 


_ The 32-bit repeat end address register (RE) contains the ending address 
of the block of program memory to be repeated when operating in the repeat 
mode. 


The repeat-count register (RC) is a 32-bit register used to specify the num- 
ber of times a block of code is to be repeated when performing a block re- 
peat. If RC contains the number n, the loop will be executed n+ 1 times. 


3.1.12 Program Counter (PC) 


The program counter (PC) is a 32-bit register containing the address of the 
next instruction to be fetched. While the program counter is not part of the 
CPU register file, it is a register that can be modified by instructions that 
modify the program flow. 


3.1.13 Reserved Bits and Compatibility 


In order to retain compatibility with future members of the TMS320C 4x fami- 
ly of microprocessors, reserved bits that are read as zero must be written 
as zero. Reserved bits that have an undefined value must not have their 
current value modified. In other cases, maintain the reserved bits as speci- 
fied. 
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3.2 CPU Expansion Register File 
This expansion register file contains two special control registers: 
(J Interrupt-vector table pointer register (IVTP), | 
[J Trap-vector table pointer (TVTP). 


Table 3-7. | CPU Expansion Registers 


Assembler Syntax Function Name 


Interrupt-vector table pointer. Points to start of interrupt- 
vector table (shown in Figure 3-8). 


Trap-vector table pointer. Points to start of the 512-trap- 
vector table (shown at page bottom). 
Use the LDEP instruction to load (copy) an expansion registertoa primary 
register (e.g., to any of the auxiliary registers ARO — AR7, see Table 3—1 on 
page 3-3). For example: 

LDEP IVTP,ARS ; IVTP contents to ARS5 


Likewise, use the LDPE instruction to load (copy) a primary register to an 
expansion register. Neither of these instructions affects the status register 
condition flags. 


LDPE ARS, IVTP ; ARS contents to IVTP 


Note that both the interrupt-vector table and the trap-vector table are re- 
quired to lie on a 512—word boundary; thus, the nine least-significant 
bits of these pointers are zeroes (i.e., 10 0000 00005 = 512 = 200h). 
Write only zeroes to these bits (though the register forces these to zeroes). 
The 32-bit IVTP register points to (is essentially the base address for) the 
interrupt-vector table (IVT) in memory. The contents of this table are de- 
picted in Figure 3—8 on page 3-16. 

The 32-bit TVTP register is essentially the base address for the trap-vector 
table (TVT) in memory. This table, depicted below, contains the vectors for 
the TRAP instruction’s 512-trap addresses (TRAPO—TRAP511). 

The interrupt (including RESET — see Section 3.3) and trap maps can be 
configured to overlap. At reset, |[VTP and TVTP are set to all zeroes. 


Figure 3-7. Trap Vector Table (TVT) 


TVTP + 000h rite 
TVTP + OOth | =»... TRAP1.....- 
® _. TRAPSOQ: 
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Figure 3-8. 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 
IVTP 


+ t+ bt te Ft + Ft FF HF FF OF FF FF FF FF FF HF Ft F + OF 


Notes: 1) 


Interrupt- Vector Table (IVT) | 
000 [le Reserved'ele| Note1 [VTP + O1Dh 
O01h Note2  IVTP + O1Eh 

Note 3 IVTP + O1Fh 
IVTP + 020h 
Noteg WTP + 2th Poe 
IVTP + 022h 
IVTP + 023h 
IVTP + 024h 
ee : IVTP + 025h 
OOCH felt at al alae IVTP + 026h 
00Dh VTP + 027h 
012h IVTP + 02Ch fe eee ee] 
013h IVTP + 
014h IVTP + 
oish | ioruua. | {°° ws -« 
016h IVTP + 
017h VTP + 
018h IVTP + 
019h IVTP + 
01Bh IVTP + O3Eh |. - 
o%ch VIP + O3Fh 


when IVTP=08000 0000h and RESETLOC(1,0) = 1 05. See Table 3-8. 
2) NMI (nonmaskable interrupt) is discussed in Section 9.9, page 9-40. 
3) Timer interrupts TINTO and TINT1 are enabled and programmed by the IIE register (subection 
3.1.9, page 3-10) and monitored at the IIF register (subection 3.1.10, page 3-12). 
4) External pins IOFO-IIOF5 are programmed in the DIE register (subsection 3.1.8, page 3-8) 
and IIF register. : | 
5) The communication port I/O buffers full/ready interrupts are enabled by the DIE and IIE re- 
gisters and also discussed in Table 8—1, page 8-10 (OUTPUT LEVEL & INPUT LEVEL bits). 
6) DMA interrupts are enabled at the IIE register and DMA channel control register (at bits TCC 
and AUX TCC explained in Table 9-1 on page 9-8). 
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3.3 RESET Vector Mapping 


The ’C40s RESET vector can reside in any one of four memory locations. 
The value on two external pins (RESETLOC(1,0)) determines the RESET 
vector location as shown in the following table. 


Table 3-8. Four RESET Vector Locations Chosen by Values on Pins RESETLOC(1,0) 


Get RESET Vector 
From Memory Address 
Po «| ~—~—«(0000 000045 | Local Bus 


08000 00001, Global Bus 
OFFFF FFFFy¢ Global Bus 


Note that if pin ROMEN = 1 and the vector at 0000 O0000h is enabled (pins 
RESETLOC(1,0) = 00), then the vector is mapped to address 0 of internal 
ROM. | 


This mapping scheme of the RESET vector allows the TMS320C40 to be 
integrated easily into systems having other processors with fixed RESET 
vector locations. It also allows you to make the RESET vector either external 
or internal (on-chip ROM) to the processor. | . 


3-17 


‘Memory 


3.4 Memory 


The TMS320C40’s memory space of 4 giga words (4 billion x 32 bits where 
1 G = 230) is shown in the two memory maps in Figure 3-9. These maps 
differ only by the makeup of the lowest address space at 0000 0000h to 
0000 OFFFh. This makeup is configured by the value at pin ROMEN 
(onchip — reserved — ROM enable, pin AK4): 


CL} ROMEN = 1. Addresses 0000h — OFFFh are an accessible onchip 
ROM block (reserved), and 0000 1000h — OOOF FFFFh are reserved 


CL} ROMEN = 0. Theon-chip (reserved) ROMis disabled, and address- 
es 0000 0000h — OOOF FFFFh are accessible over the local bus. 


Memory in both maps starting at 10 0000h is not affected by ROMEN (as 
described for addresses 00000h — FFFFFh above). A general summary of 
address ranges: 


L} 0000 0000h — OOOF FFFFh: Can be local bus or on-chip (reserved) 
ROM, depending on the value of pin ROMEN. 
[i 0010 0000h—0010 OOFFh: Internal peripherals 


(DMA coprocessor, communications ports, timers, a 
etc.) _ Instructions 


| th 
0010 0100h - 001F FFFFh: Internal peripheral re- ? soceseed in 
gion. | these 3 areas. 


0020 0000h — 002F F7FFh: Reserved. 
002F F800h — 002F FBFFh: 1K RAM Block 0. 
002F FCO0h — 002F FFFFh: 1K RAM Block 1. 


0030 0000h — O7FFF FFFFh: Local bus. If ROMEN = 1, another part 
of the local bus is at00 0000h — OF FFFFh. These addresses activate 
the local bus. 


— 08000 0000h — OFFFF FFFFh: Global bus. 


Oodcao @ 


CPU data accesses and DMA accesses can be made from any unreserved 
part of the ’C40 memory map. Instruction fetches can take place at any unre- 
served area of the 'C40 memory map except at the peripheral space (ad- 
dresses 0010 0000h — 0010 OOFFh). 


The ’C40’s internal ROM is currently reserved for TI internal use only. 
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3.4.1. Overall Memory Map 
Figure 3-9. | Memory Maps 


Memory. 


z 
Or 
cc a a f 00000 0000h_ | 
Zz : 
= » i Accessible 00000 OFFFh 
S9 2 1M Local Bus 00000 1000h = | 
fc LU ce (External) : 
a Q000F FFFFh See 
Peripherals (Internal) 00010 0000h Beroherale ‘interpall 
~~. ~ (See Figure 3-10) __ _ _| o001000FFh |... (See Figure 3-10) _ _ _ 
00010 0100h 
Reserved 
0001F FFFFh 
00020 0000h 
Reserved 
.- =| oo0ae Fret on 
0002F FBFFh | _ JK RAM BLKO (Internal) | _ 
= 1K RAM BLK 1 (Int | 0002F FCOOh 1K RAM BLK 1 | 
< (Internal) O00SF FEEFH | (Internal) 
_ 00030 0000h 
Z | 
a 
Lu 
= 3M Local Bus Local Bus — 
5 - (External) (External) 
ir 
t— 
on ; 
O7FFF FFFFh 
08000 0000h 
, Global Bus Global Bus 
, (External) (External) 
| OFFFF FFFFh 
(a) Internal ROM Disabled (b) Internal ROM Enabled 
(ROMEN = 0) (ROMEN = 1) 
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3.4.2 Peripheral Bus Memory Map 
This map resides in addresses 0010 0000h — 0010 OOFFh as shown in the 
memory map, Figure 3-9. Each peripheral requires a 16-word area. 


Figure 3-10. Peripheral Memory Map 


0070 0000h vt Cdleanig nit fale Oba) COR IAGO AV MICS e/a o 
0010 OOOFh i" P i saihes fy ; pala Jab rte) HTT 

0010 0010h i, ak Me ul Peas ck Reg aist rs i ews ql : d / | | il | 
0010 0O1Fh || ($Pe SuBse mecich 8 rel nd i ik re i i! uu i Hil 


0010 0020h Saas (#8. words). tt 
0010 002Fh |,"{See'slibseotion'3:4'2.3,and, Figure, oe 3) srt 


re 
yy ho te Hi 
un iN 


hy a ; 5 tt i are ™ “T ; 
5 i 4; ‘9 7 
“Oy My 


001 0 0030h N Tirhers-t. Registers” (16 words)” in sy, \ me Mn, SN \ 


iy 
My i ' 
fF My ‘hy, ny a ai My, 


0010 003Fh |" wiSee subsection, 8. 42 3 and Figure aa 8), a th ‘ ns as 
0010 0040h : = 

0010 004Fh | 
0010 0050h | 
0010 OO5Fh | 
0010 0060h | 
0010 006Fh | 
0010 0070h 
0010 007Fh 


0010 0080h 
0010 008Fh 


0010 0090h 
0010 009Fh 
0010 O0AOh fi, 
0010 OOAFh |, 


0010 00BOh f aN 
0010 OOBFh |. 


0010 00COh |i, EMA 
0010 00OCFh — : 


0010 O0DOh | 
0010 OoDFh [' 


0010 OOEOh ¥ 
0010 OOEFh | 


0010 OOF Oh |; "if 
0010 OOFFh |i, 
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3.4.2.1 Local and Global Memory Interface Control Registers 


These registers control the local and global memory interfaces. They 
occupy the first 16-word block of the peripheral bus memory map, shown 
in Figure 3—10. The registers themselves are shown in Figure 3—11. Chap- 
ter 7 covers the operation of these registers. A detailed description of these 
is shown in Figure 7-2 and Table 7-3 (pages 7-7and 7-8). 


These registers define: 
m the page sizes used for the two strobes of each port, 
_™@ address ranges over which the strobes are active, 
—@ wait states, and 
m@ other similar operations that compose the memory interfaces. 


Figure 3-11. | Memory Interface Control Registers 


0010 0000h | 
0010 0001h 


0010 0003h 
0010 0004h ["Koéal 
0010 0005h 


Reserved 


0010 OOOFh 


3.4.2.2 Analysis Module Registers 


These registers, the second 16-word block in the peripheral bus memory 
Map (Figure 3—10), are shown below in Figure 3-12. These registers are 
reserved for emulation functions. 


Figure 3-12. Analysis Module Registers 


0010 0012h 
0010 0013h 4 
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3.4.2.3 Timer Registers 


This group of registers occupies the 0010 0020h — 0010 003Fh range in the 
peripheral bus memory map, Figure 3-10, on page 3-20. Timers and their 
registers are covered in detail in Section 9.10 on page 9-45. 


Figure 3-13. Timer Registers 


0010 0020h f,s\"."-Ti 


Timer 0 7 
0010 0024h on 


0010 0028h |:. 


0010 0030h ["s.*.%Tim 


Reserved 


Timer 1 0010 0034h |':..":.Timer.t.Counter, Register... 


0010 0038h |... Timer 1°Period Register. * 


Reserved 
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3.4.2.4 Communication Port Memory Map 


The communication-port control registers (CPCR) and input and output 
FIFO buffers are illustrated below in Figure 3-14. This is the central group 
of registers in the peripheral bus memory map, Figure 3-10, on page 3-20. 
These are described in more detail in Chapter 8. 


Figure 3-14. | Communication Port Memory Map 


0010 0040h 
0010 0041h 


0010 0042h 


Peateatetata 


x 9 


Res 
0010 0050h ioe 
0010 005th 


0010 0052h Out : : 
0010 0060h : 
0010 006th 
0010 0062hf Geen. : 
010 0070h Bo 


0010 0071h 
0010 0072h 


0010 0080h 
0010 008th 
010 0082h 


Reserved 


0010 0090h 
0010 009th 
0010 0092h 


% % 
5 ' 1 
oS OO CSS & 


0010 OO9Fh 
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3.4.2.5 DMA Coprocessor Registers 


The DMA registers (shown below) are the bottom block of registers in the 
peripheral bus memory map (Figure 3-10 on page 3-20). These registers 
are described in Chapter 9. Figure 9-2, page 9-5, is an index to subjects. 


Figure 3-15. | DMA Coprocessor Memory Map 


0010 OOAOh oi ~' , finn cil ia 


0010 OOA8h Seentouy ie 
0010 OOASh 


Ree Sbhe tbe 


0010 
0010, 


OOAFh 
00B0h |r Ghe 


OOB8h ei Mie 
OOB9h 


0010 
0010 


0010 
0010 


OOBFh } Hee NE ae 

00COh |. chat A i 
~~. emer _ 

(See pended Pee 

i : “ VIGW):. fi Mee 


0010 
0010 


00C8h 
00CSh 


0010 
0010 


OOCFh 
00DOh J)’ Channels," 


j| oMachs 


O0OD8h (see 8x 
0OD9h 


0010 
0010 


0010 
0010 


OODFh 
OOEOh os 


DMA Ch 4 
OOE8h ne yy 
OOE9h 


0010 
0010 


0010 
0010 


OOEFh 


DMA Ch 5 


+ 


(seer eroded 
0010 ooF8h L___view), 


0010 OOF9h 
_ Reserved 
0010 OOFFh 
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010: 


010 
010 
010 
010 
010 
010 
010 
010 


EXPLODED VIEW OF EACH CHANNEL 
REGISTER 


00z0h 
O0Ozih 
00z2h 
00z3h 
00z4h : 
00z5h| Destination Address Index x x Lox 
0026h |. Link Pointer x. a gn 
00z7h| Auxiliary irancier Counter x x” 

00z8h| Auxiliary Link Pointer x =. a aE 


= channel number (e.g., all are 1 fof 
channel 1, all 2 for channel 2, etc.). 
Z= corresponding hexadecimal digit for 
channel address (e.g., substitute 
an ”A” for DMA channel 0, ”B” for 
DMA channel 1, etc.). 


“Control Register x _ eo CR 
Source Address x: 
Source Address index x me ay 


_Transfer Counter x_ zs DMA 


“Destination Adress x aon : . 


These registers are described in 
Chapter 9, and an index of de- 
scription locations ts listed in 
Figure 9-2 on page 9-5. 
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3.5 Instruction Cache Architecture 


Figure 3-16. 


The 128 x 32-bit instruction cache speeds instruction fetches and lowers 
system cost. The instruction cache allows the use of slow external memo- 
ries while still achieving single-cycle access performance. The cache also 
frees the external bus from program fetches, thus, allowing the use of these 
buses for DMA or other system needs. The cache can operate in a com- 
pletely automatic fashion without the need for external intervention. It uses 
a form of the LRU (least recently used) cache update algorithm. 


The instruction cache (see Figure 3—17 on page 3-26 ) contains 128 32-bit 
words of RAM, enough to hold 128 words of program memory. It is divided 
into four 32-word segments. Associated with each segment is a 27-bit seg- 
ment start address (SSA) register. For each word in the cache, there is a 
corresponding single-bit present (P) flag. 


When the CPU requests an instruction word, a check is made to determine 
whether the word is already in the instruction cache. The partitioning of an 
instruction address as used by the cache control algorithm is shown in 
Figure 3-16. The 27 most significant bits (MSBs) of the instruction address 
select the segment, and the 5 least significant bits define the address of the 
instruction word within the pertinent segment. The 27 MSBs of the instruc- 
tion address are compared with the four SSA registers. If a match is found, 
the relevant P flag is checked. The P flag indicates whether or not the word 
within a particular segment is already present in cache memory: | 


[1 P=1: the word is already present in cache memory. 
[1 P=0: location in cache is invalid (e.g., contains garbage). 


Address Partitioning for Cache Control Algorithm 
31 


lf there is no match, one of the segments must be replaced by the new data. 
The segment replaced in this circumstance is determined by the LRU (least 
recently used) algorithm. The LRU stack (see upper right of Figure 3-17 on 
page 3-26) is maintained for this purpose. 
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Figure 3-17. Instruction Cache Architecture 


Segment Start 
Address Registers 


LRU 

Stack Most Recently 
Used Segment 
Number 


Least Recently 
Used Segment 
Number 


Segment Word 30 
Segment Word 31 | 


| | Segment Word 0 
1 |e Segment Word 1 
Segment 1 


Segment Word 30 


Segment Word 31 


Segment Word 0 
Segment Word 1 


; Segment 2 


Segment Word 30 
Segment Word 31 


Segment Word 0 
Segment Word 1 


a Segment 3 
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The LRU stack keeps track of which segment (0 — 3) qualifies as the least 
recently used after each access to the cache. Each time a segment is ac- 
cessed, its segment number is removed from the LRU stack and pushed 
onto the top of the LRU stack. Therefore, the number at the top of the stack 
is the most recently used segment number, and the number at the bottom 
of the stack is the least recently used segment number. 


At RESET, the following occur in the instruction cache: 

[J all P flags are setto zero, and 

LJ the LRU stack is initialized with segment no. 0 at the top followed by 
1,2, and 3 at the bottom. If any two SSA registers are equal (due to RE- 
SET conditions) and a cache hit occurs, the instruction word is fetched 
from the most recently used segment. 


When a replacement is necessary, the least recently used segment is se- 
lected for replacement. Also, the 32 P flags for the segment to be replaced 
are set to 0, and the segment’s SSA register is replaced with the 27 MSBs 
of the instruction address. 


3.5.1 Cache Algorithm 


When the TMS320C40 requests an instruction word from external memory, 
the two possible actions are a cache hit or a cache miss. 


[J Cache Hit. The cache contains the requested instruction, and the fol- 
lowing actions occur: 


™ The instruction word is read from the cache. 


m The number of the segment containing the word is removed from 
the LRU stack and pushed to the top of the LRU stack (if not already 
at the top), thus moving the other segment numbers toward the bot- 
tom of the stack. 


[1 Cache Miss. The cache does not contain the instruction. Types of 
cache misses are 


m Subsegment miss. The segment address register matches the in- 
struction address, but the relevant P flag is not set. The following 
actions occur: 
= The instruction word is read from memory and copied into the 
cache. 

= Thenumber of the segment containing the word is removed from 
the LRU stack and pushed to the top of the LRU stack (if not al- 
ready at the top), thus moving the other segment numbers to- 
ward the bottom of the stack. 


= The relevant P flag is set. 
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m= Segment miss. None of the segment addresses matches the in- 
struction address. The following actions occur: 


= The least recently used segment is selected for replacement. 
The P flags for all 32 words are cleared. 


= The SSA register for the selected segment is loaded with the 27 
MSBs of the address of the requested instruction word. 


= The instruction word is fetched and copied into the cache. It goes 
into the appropriate word of the least recently used segment. The 
P flag for that word is set to 1. 


= The number of the segment containing the instruction word is re- 
moved from the LRU stack and pushed to the top of the LRU 
stack, thus moving the other segment numbers toward the bot- 
tom of the stack. 


3.5.2 Cache and System Memory 


Only instructions may be fetched from the program cache. All reads and 
writes of data in memory bypass the cache. Program fetches from internal 
memory do not modify the cache and do not generate cache hits or misses. 
The program cache is a single-access memory block. Dummy program 
fetches (i.e., following a branch) can generate cache misses and cache up- 
dates. 


Avoid using self-modifying code. If an instruction resides in cache and 
the corresponding location in primary memory is modified, the copy of the 
instruction in cache is not modified. 


Cache can be used more efficiently by aligning program code on 32-word 
address boundaries. Do this by using the ALIGN directive when coding as- 
sembly language. 
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3.5.3 Cache Control Bits 


Table 3-9. 


Four cache control bits are located in the CPU status register: the cache 
clear bit (CC), the cache enable bit (CE), the cache freeze bit (CF), and the 
previous cache freeze bit (PCF) as shown in Figure 3-3 on page 3-6. The 
definitions of these bits are repeated below from Table 3—2. 


Cache Clear Bit(CC). Set CC = 1 to invalidate all entries in the cache (con- 
tents not guaranteed, "garbage”). This bit is always cleared after itis 
written to; thus, it is always read as 0. At reset, 0 is written to this bit. 
The cache P flag = 0 when cache is cleared. 


Cache Enable Bit (CE). Set CE = 1 to enable the cache, allowing the cache 
to be used according to the LRU (least recently used) cache algo- 
rithm. Set CE = 0 to disable the cache; this prevents cache updates or 
modifications (thus no cache fetches can be made). At reset, 0 is writ- 
ten to this bit. Cache clearing (CC = 1) is allowed when CE=0. 


Cache Freeze Bit (CF). Set CF = 1 to freeze the cache (cannot be written to) 
including freezing of LRU (least recently used) stack manipulation. If 
the cache is enabled (CE = 1), fetches from the cache are allowed, 
but modification of the cache contents is not allowed. Cache clearing 
(CC=1) is allowed. . At reset, this bit is set to zero. When CF=0, cache 
clearing (CC=1) is allowed. CF is set to one when atrap or interrupt is 
taken. Also, the RETI and RETID instructions copy PCF to the CF bit. 


Table 3—9 defines the effect of the CE and CF bits used in combina- 
tion. 


Combined Effect of the CE and CF Bits 


| o | o | Cache not enabled 
| 0 | 4 |Cachenotenabled 
| 1 | 0 | Cache enabled and not frozen | 


Previous Cache Freeze Bit (PCF). When an interrupt or trap vector is tak- 
en, the CF value is copied to the PCF bit and the CF bit is setto 1. This 
protects the cache during interrupt processing and is particularly use- 
fulwhen code loops are interrupted. The interrupt service routine may 
optionally use the cache under software control. Interrupts may also 
be nested, providing that the status register is saved prior to enabling 
the interrupts. When the instructions RETlcond and RETIconoD are 
executed to complete interrupt processing, the contents of the PCF 
bit are copied to the CF bit. | 
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In the TMS320C40 architecture, data is organized into three fundamental 
types: integer, unsigned-integer, and floating-point. Note that the terms, in- 
teger and signed-integer, are considered to be equivalent. The TMS320C40 
supports short and single-precision formats for signed and unsigned inte- 
gers. It also supports short, single-precision and extended-precision for- 
mats for floating-point data. , 


Floating-point operations make fast, trouble-free, accurate, and precise 
computations. Specifically, the TMS320C40 implementation of floating- 
point arithmetic facilitates floating-point operations at integer speeds while 
preventing problems with overflow, operand alignment, and other burden- 
some tasks Common in integer operations. 


This chapter discusses in detail the data formats and floating-point opera- 
tions supported on the TMS320C40. Major topics in this section are as fol- 


lows: | 
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4.1 Signed Integer Formats 


The TMS320C40 supports two integer formats: a 16-bit short integer format 
and a 32-bit single-precision integer format. When extended-precision 
registers are used as integer operands, only bits 31— 0 are used; bits 39 —32 
remain unchanged and unused. 


4.1.1 Short Integer Format 


The short integer format is a 16-bit twos-complement integer format used 
for immediate integer operands. For those instructions that assume integer 
operands, this format is sign extended to 32 bits (see Figure 4—1). The 
range of an integer si, represented in the short integer format, is: 


—~215< sj< 215-4 


In Figure 4—1 and other figures in this chapter, s = sign bit. 


Figure 4-1. Short Integer Format and Sign Extension of Short Integer 


31 16 15 0 


(b) Sign Extension of a Short Integer 


4.1.2 Single-Precision Integer Format 


In the single-precision integer format, the integer is represented in 
twos-complement notation. The range of an integer sp, represented in the 
single-precision integer format, is— 231 < sp< 231 —1. Figure 4-2 shows the 
single-precision integer format. 


Figure 4-2. Single-Precision Integer Format 


31 0 
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4.2 Unsigned-Integer Formats 


Two unsigned-integer formats are supported on the TMS320C40: a 16-bit 
short format and a 32-bit single-precision format. In extended-precision reg- 
isters, the unsigned-integer operands use only bits 31— 0; bits 39 — 32 re- 
main unchanged. 


4.2.1. Short Unsigned-integer Format 


Figure 4—3 shows the16-bit, short, unsigned-integer format used for imme- 
diate unsigned-integer operands. For those instructions that assume 
unsigned-integer operands, this format is zero filled to 32 bits. In Figure 4-3 
below, x = MSB (1 or 0). 


Figure 4-3. Short Unsigned-integer Format and Zero Fill 


(a) Short Unsigned-integer Format 


(b) Zero Fill of a Short Unsigned Integer 


4.2.2 Single-Precision Unsigned-Integer Format 


In the single-precision unsigned-integer format, the number is represented 
as a 32-bit value, as shown in Figure 4-4. 


Figure 4-4. Single-Precision Unsigned-Integer Format 


31 0 
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4.3 Floating-Point Formats 


Figure 4-5. 


All TMS320C40 floating-point formats consist of three fields: an exponent 
field (e), a single-bit sign field (s), and a fraction field (f). These are 
stored as shown in Figure 4—5. The exponent field is a twos-complement 
number. The sign field and fraction field may be considered as one unit and 
referred to as the mantissa field (man). The mantissa is used to represent 
a normalized twos-complement number. In a normalized representation, a 
most significant nonsign bit is implied, thus providing an additional bit of pre- 
cision. The value of a floating-point number x as a function of the fields e, 


s, and fis given as fa 
X= 01.fx 2° ifs=0 


X= 10.fx 2° ifs= 1 
x=0 if e = most negative twos-complement. 
value or the specified exponent field width 


Generic Floating-Point Format 


<+—————_ man (mantissa) ——_> 


Note: e = exponent field 
S$ = single-bit sign field 
f = fraction field 


Three floating-point formats are supported on the TMS320C40: 


[1 a short floating-point format (for immediate floating-point operands) 
consisting of a 4-bit exponent, 1 sign bit, and an 11-bit fraction, 

Li asingle-precision format consisting of an 8-bit exponent, 1 sign bit, and 
a 23-bit fraction, and | 

[J anextended-precision format consisting of an 8-bit exponent, 1 sign bit, 
and a 31-bit fraction. 
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4.3.1 Short Floating-Point Format 


Figure 4-6. 


Inthe short floating-point format, floating-point numbers are represented by 
a twos-complement 4-bit exponent field (e) and a twos-complement 12-bit 
mantissa field (man) with an implied most significant nonsign bit. 


Short Floating-Point Format 


15 12| 11 | 10 0 


e———— nan =—_—_—_—_—— > 


Operations are performed with an implied binary point between bits 11 and 
10. When the implied most significant nonsign bit is made explicit, it is lo- 
cated to the immediate left of the binary point. The floating-point twos-com- 
plement number x in the short floating-point format is given by 


X= 01.fx 2e ifs=0 
X= 10.fx 2e ifs=1 
x=0 ife=—8,s=0,f =0 


You must use the following reserved values to represent Zero in the short 
floating-point format: 


e=-8 
s=0 
f=0 


The following examples illustrate the range and precision of the short float- 
ing-point format: - 


Most Positive: X=(2-—2-11) x 27 = 2.5594 x 102 
Least Positive: X=1x2-7=7.8125 x10-3 

Least Negative: X=(-1—- 2-11) x2 -7 = -7.8163 x10 -3 
Most Negative: X=-2x2/=—2,5600 x10 2 
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4.3.2 Single-Precision Floating-Point Format 


In the single-precision format, the floating-point number is represented by 
an 8-bit exponent field (e) and a twos-complement 24-bit mantissa field 
(man) with an implied most significant nonsign bit. 


Operations are performed with an implied binary point between bits 23 and 
22. When the implied most significant nonsign bit is made explicit, it is lo- 
cated to the immediate left of the binary point. The floating-point number x 
is given by 


X= 01.fx 2° ifs=0 


X= 10.fx 2e ifs= 1 
x=0 ife=-—128,s=0,f=0 


Figure 4-7. Single-Precision Floating-Point Format 


31 24 | 23 22 0 


= man —————_—_> 
You must use the following reserved values to represent zero in the single- 
precision floating-point format: 

=—128 

s=0 

f= 0 
The following examples illustrate the range and precision of the single-pre- 
cision floating-point format. 

Most Positive: X = (2—2-23) x 2127 — 3.4028234 x1038 

= 1x2-127 ~5.8774717 x10 -99 
(—1-2 -23) x2 -127 =— 5.8774724x10- 39 
—2x2127-— 3.4028236 x1088 


Least Positive: 
Least Negative: 
Most Negative: 


<x «< xX 
ll 
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4.3.3 Extended-Precision Floating-Point Format 


Figure 4-8. 
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In the extended-precision format, the floating-point number is represented 
by an 8-bit exponent field (e) and a 32-bit mantissa field (man) with an im- 
plied most significant nonsign bit. 


Operations are performed with an implied binary point between bits 31 and 
30. When the implied most significant nonsign bit is made explicit, it is lo- 
cated to the immediate left of the binary point. The floating-point number x 
is given by: 


X= 01.fx 2¢ ifs=0 
X= 10.fx 26 ifs=1 
x=0 ife=—-128,s=0,f=0 


Extended-Precision Floating-Point Format 


39 32| 31 | 30 0 


Pere Qe eee > 


You must use the following reserved values to represent zero in the exten- 
ded-precision floating-point format: 


=—128 
s=0 
f=0 


The following examples illustrate the range and precision of the exten- 
ded-precision floating-point format: 


Most Positive: X = (2-—2-31)x 2127 = 3.4028236683 x1038 
Least Positive: X = 1x2-127 = 5.8774717541 x10 —-39 

Least Negative: X = (—1-2 —31)x2 -127 =— 5.8774717569x10 —39 
Most Negative: X = —2x 2127 =— 3.4028236691 x 1038 
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4.3.4 Conversion Between Floating-Point Formats 


Floating-point operations assume several different formats for inputs and 
outputs. These formats often require conversion from one floating-point for- 
mat to another (e.g., short floating-point format to extended-precision float- 
ing-point format). Format conversions occur automatically inhardware, with 
no overhead, as a part of the floating-point operations. Examples of the four 
conversions are shown below. When a floating-point format zero is con- 
verted to a greater precision format, it is always converted to a valid repre- 
sentation of zero in that format. In the figures below, s = sign bit of the expo- 


nent. 
i Short floating-point format conversion to single-precision 4 
floating-point format. 


15 12 11 10 0 


(a) Short Floating-Point Format 


31 27 24 23 22 0 


- (b) Single-Precision Floating-Point Format 


In this format, the exponent field is sign extended and the fraction field 
filled with zeros. | : 

L} Short floating-point format conversion to extended-precision 
floating-point format. 


15 1211 10 0 


(a) Short Floating-Point Format 


39 35 32 31 30 20 19 0 


(b) Extended-Precision Floating-Point Format 


The exponent field in this format is sign extended and the fraction field 
filled with zeros. | 
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L) Single-precision floating-point format conversion to extended 
precision floating-point format. 


31 24 23 2 


(a)Single-Precision Floating-Point Format 


39 32 31 30 8 7 0 


(b) Extended-Precision Floating-Point Format 


The fraction field is filled with zeros. 


.) Extended-precision floating-point format conversion to single- 
precision floating-point format. 


39 3231 30 8 7 0 


(b) Single-Precision Floating-Point Format 


The fraction field is truncated. 
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4.4 -slidieninae Conversions (IEEE Std. 754/’C4x) 
Figure 4-9. IEEE Single-Precision Sid. 754 Floating-Point Format 


31 30 2af 22 0 


P= MM —_—__P 


This IEEE format is depicted in Figure 4—9 above. The following five cases 
define the value vof a nurape! Pernice ee in this format: 


1) If then v=Na 

2) I then v 

3) then v= (1) <29427(1.9 
4) If then = = (-1)8 x 2-128(0.f) 
5) I then v=(-1)80 (zero). 


where S= : sign bit: e=the sponeh! field: f= the fraction field. 


For the above five representations, eis treated as an unsigned integer. Case 
1 generates NaN (not an number) and is primarily used for software signal- 
ing. Case 4 represents a denormalized number. Case 5 represents positive 
and negative zero. 


Figure 4-10. | TMS320C4x Single-Precision Twos-Complement Floating-Point 
Format + 


31 24 23 22 0 


In comparison, Figure 4-10 shows the the ’C40 twos-complement floating- 
point format. In this format, two cases Can be used to define value v ofa 
number: 


1) If 
2) If 


where S = sign Bie 6 the Soren field: f= the fraction nifieid: 


t NaN = notanumber — | 
+ Same format as for the TMS320C3x 
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| 4 4.4.1 


Table 4-1. 


For this representation, eis treated as atwos-complement integer. The frac- 
tion and sign bit form a normalized twos-complement mantissa. 


ee 


Note: Symbols to Differentiate Between IEEE and ’C40 Formats 


In order to differentiate between the symbols used to define these two for- 
mats, all IEEE fields are subscripted with an IEEE (e.g., éicce, Sicce, etc.). 
Similarly, all twos-complement fields are subscripted with a two (i.€., Grwo; 


Stwo» ‘wo): 


Converting IEEE Format to Twos-Complement 
Floating-Point Format 


The most common conversion is the IEEE to twos-complement format. This 
conversion is done according to rules in the following table: 


Rules for Converting IEEE Format to Twos-Complement Floating-Point Format 


If These Values Are Present Then These Values Equal 


[see [=| 


00 0000h 


t #/EEE = ones complement of fEEE. 
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Case 1 maps the IEEE positive NaNs and positive infinity to the single-preci- 
sion twos-complement most positive number. Overflow is also signaled to 
allow you to check for these special cases. 


Case 2 maps the IEEE negative NaNs and negative infinity to the single- | 
precision twos-complement most negative number. Overflow is also sig- 
naled to allow you to check for these special cases. 


Case 3 maps the IEEE positive normalized numbers to the identical value 
in the twos-complement positive number. 


Case 4 maps the IEEE negative normalized numbers with a nonzero frac- 
tion to the identical value in the twos-complement negative number. 


Case 5 maps the IEEE negative normalized numbers with a Zero fraction 
to the identical value in the twos-complement negative number. 
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Case 6 maps the IEEE positive and negative denormalized numbers and 
positive and negative zeroes to a twos-complement zero. 


The TMS320C40 assumes that the IEEE numbers are stored as an integer 
in memory or in a register. When converted, they are always placed in an 
extended-precision register by using the exponent and fraction fields of 
these registers. Any arithmetic operations that are performed on the fraction © 
field of the IEEE number should be performed only on the IEEE fraction 
field. The eight LSBs of the extended-precision register are set to zero. 


4.4.2 Converting Twos-Complement Floating-Point Format 4 
to IEEE Format 
This conversion is done according to rules in the following table: 


Table 4-2. Rules for Converting Twos-Complement Floating Point Format to IEEE Format 


if These Values Are Present Then These Values Equal 
oe tee | to ae 


— 


00 0000h 


t fwo = ones complement of fio. 
Case 1 maps a twos-complement zero to a positive IEEE zero. 


Case 2 maps the twos-complement numbers that are too small to be repre- 
sented as normalized IEEE numbers to a positive IEEE zero. 


Case 3 maps the positive twos-complement numbers that are not covered 
by case 2 into the identically valued IEEE number. 
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Case 4 maps the negative twos-complement numbers with a nonzero frac- 
tion that are not covered in case 2 into the identically valued IEEE number. 


Case 5 maps all the negative twos-complement numbers with a zero frac- 
tion, except for the most negative twos-complement number and those that 
are not covered in case 2, into the identically valued IEEE number. 


Case 6 maps the most negative twos-complement number to the IEEE neg- 
ative infinity. 


TheTMS320C4x assumes that the twos-complement numbers are in 
memory or are in an extended-precision register using the exponent and 
fraction field of the register (shown in Figure 4—10 on page 4-11). If the val- 
ue is inan extended-precision register, then only the 24 MSBs of the fraction 
field are manipulated as the fraction field and for detection of the special 
cases. The result of the conversion goes to a register as an integer. 
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4.5 Floating-Point Multiplication 


A floating-point number a can be written in floating-point format as in the fol- 
lowing formula, where o(man) is the mantissa and a(exp) is the exponent. 


o = (man) x 2a(exp) 

The product of « and b is c, defined as 
¢ = «xb =a(man) x b(man) x 2(a(exp)+b (exp) 
c(man) = a(man) x b(man) 


c(exp) = a(exp) + b(exp) 4 


During floating-point multiplication, source operands are always assumed 
to be in the extended-precision floating-point format: 


If the source of the operands is in short floating-point format, it is ex- 
tended to the extended-precision floating-point format. 


If the source of the operands is in single-precision floating-point for- 
mat, it is extended to extended-precision format. 


These conversions occur automatically in hardware with no overhead. All 
results of floating-point multiplications are in the extended-precision format. 
These multiplications occur in a single cycle. 


4-15 


bitaiatnin’ aaa atasaahais ahaa deat at 


ana ayn a aon nt nn ane dae na 


i 


Floating-Point 


arataatatatatata*atn*aha’ateSo°a°atalnra’da'a*atsPalahaalaiaiata integra tats ait saat at stat ahenah aa’ a ainial aalarantanaPanaa aha ara"s"a*stetyhanna'ehatee°ann ana" ahs'aPanstacanabarataan natasha anal anaes in naa a sana da aa aa 


Figure 4-11. Flowchart for Floating-Point Multiplication 
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o(man) b(man) o.(exp) b(exp). 


1) 


lo normalize 


c(man) > > 1 c(man) > > 2 
and c(exp) = and c(exp) = 
c(exp) + 1 c(exp) + 2 


Put c(man) in extended— 
precision floating-point 
format 


c(exp) = -128 
¢(man) =0 


lf c(man) > 0, 
set c to most 
positive value. 

lf c(man) < 0, 
set c to most 

negative value. 
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Figure 4—11 is a flowchart showing floating-point multiplication: 


1) In step 1 (steps are shown as numbers in parentheses), the 32-bit 
source operand mantissas are multiplied, producing a 64-bit result 
c(man). (Note that input and output data are always represented as nor- 
malized numbers.) 


2) Instep 2, the exponents are added, yielding C(@xp). 
3) Steps 3 through 6 check for special cases. 


4) Step 3checks for whether c(man) in extended-precision format is equal 
to zero. If c(man) is zero, step 7 sets c(exp) to —128, thus yielding the 4 
representation for zero. 


5) Steps 4 and 5 normalize the result. 


6) If aright shift of one is necessary, then in step 8, c(man) is right-shifted 
one bit, and one is added to c(exp). 


7)  \faright shift of two is necessary, then in step 9, c(man) is right-shifted 
two bits, and two is added to c(exp). Step 6 occurs when the result is 
normalized. | 


8) Instep 10, c(man) is set inthe extended-precision floating-point format. 
9) Steps 11 through 16 check for special cases of c(exp). 


10) In step 14, if c(exp) has overflowed (step 11) in the positive direction, 
then c(exp) is set to the most positive extended-precision format value. 
If c(exp) has overflowed in the negative direction, then c(exp) is set to 
the most negative extended-precision format value. — 


11) If c(exp) has underflowed (step 12), then cis set to zero (step 15); i.e., 
c(man) = 0 and c(exp) = —128. 
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The following examples illustrate how floating-point multiplication is per- 
- formed on the TMS320C40. For these examples, the implied most signifi- 
cant nonsign bit is made explicit. 


Example 4-1. Floating-Point Multiply (Both Mantissas = -2.0) 


Let | | 
Oo = —2.0 x 20(€xP) = 10.00000000000000000000000 x 2«(exP) 
b =—2.0 x 2b(€xp) = 10.00000000000000000000000 x 25(ex~p) 


where aandb are both represented in binary form according to the normalized single-pre- 
cision floating-point format. Then 
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To place this number in the proper normalized format, it is necessary to shift the mantissa 
two places to the right and add two to the exponent. This yields 


In floating-point multiplication, the exponent of the result may overflow. This can occur 
when the exponents are initially added or when the exponentis modified during normaliza- 
tion. 


_ Example 4-2. Floating-Point Multiply (Both Mantissas = 1.5) 


Let 
a= 1:5 x 20(exP) = 01.10000000000000000000000 x 2@( xP) 


b = 1.5 x 2b(exp) = 01.10000000000000000000000 x 2b( exp) 


where a and b are both represented in binary form according to the single-precision float- 
ing-point format. Then 


O00 
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Floating-Point Multiplication 


To place this number in the proper normalized format, it is necessary to shift the mantissa 
one place to the right and add one to the exponent. This yields 


Example 4-3. Floating-Point Multiply (Both Mantissas = 1.0) 
Let 


a = 1.0 x 20(€xP) = 01.00000000000000000000000 x 2«(ex~p) 
b = 1.0 x 2b(€xP) = 01.00000000000000000000000 x 26(exP 


where a and b are both represented in binary form according to the single-precision float- 
ing-point format. Then 


This number is inthe proper normalized format. Therefore, no shift of the mantissa or mod- 
ification of the exponent is necessary. 


These examples have shown cases where the product of two normalized numbers can 
be normalized with a shift of zero, one, or two. For all normalized inputs with the float- 


ing-point format used by the TMS320C40, a normalized result can be produced by a shift 
of zero, one, or two. | 


Example 4-4. Floating-Point Multiply Between Positive and Negative Numbers 
Let 


a = 1.0 x 20(€xP) = 01.00000000000000000000000 x 2«(exP 
b = -2.0 x 2(exP) = 10.00000000000000000000000 x 2b(exp 


The result is C=—2.0 x 2(a(exp) + b(exp 


Example 4-5. Floating-Point Multiply by Zero 
All multiplications by a floating-point Zero yield a result of zero (f=0, s=0, and exp= —128). 
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4.6 Floating-Point Addition and Subtraction 


"In floating-point addition and subtraction, two floating-point numbers o. and 
_ bean be defined as 


© = oman) x 2 «(ex~p) 
b = b(man) x 2 (exp) 


The sum (or difference) of a and b can be defined as 


c=a+tb 
= (o(man) + (b(man) x 2 —((exp)—b(exp)))) x 2 «( exp), 
if (exp) = b(exp) 
= ((a(man) x 2 —(b(exp)—a(exp))) + b(man)) x 2 (exp), 
if a(exp) < b(exp) 


Figure 4—12 is the flowchart for floating-point addition. Since this flowchart 
assumes signed data, it is also appropriate for floating-point subtraction. In 
this figure, it is assumed that a(exp) < b(exp). In step 1 (steps are numbers 
in parentheses), the source exponents are compared, and c(exp) is set 
equal to the largest of the two source exponents. In step 2, d is set to the 
difference of the two exponents. In step 3, the mantissa with the smallest 
exponent, in this case a(man), is right shifted d bits in order to align the man- 
tissas. After the mantissas have been aligned, they are added (step 4). 


Steps 5 through 7 check for a special case of c(man). If c(man) is zero (step 
5), then c(exp) is set to its most negative value (step 8) to yield the correct 
representation of zero. If c(man) has overflowed c (step 6), then in step 9, 
c(man) is right shifted one bit, and one is added to c(exp). In step 10, the 
result is normalized. In steps 11 and 12, special cases of c(exp) are tested. 
If c(exp) has overflowed, then c is set to the most positive extended-preci- 
sion value if it is positive; otherwise, it is set to the most negative extended- 
precision value. 
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Figure 4-12. Flowchart for Floating-Point Addition 


o(man) b(man) 


(3) 


(2) |} 


(a 
Cc (man) = a(man) + b(man) 


Overtiow 0 of 5 o(man 


c(man) = c(man) > > 1 
c(exp) = c(exp) + 1 
Discard LSBs to keep in 
extended-precision 


floating-point format (10) 


| sex underflow. weet 


ae in range. 


(exp) overflow 


If c(man) > 0, 


(14) ied eae set c to zero 
v c(exp) =-128 
positive value. c(man) =0 


If c(man) < 0, 
set c to most 
negative value. 


c=a+b 
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Floating-Point Addition and Subiraction ae 


Example 4-6. 


The following examples describe the floating-point addition and subtraction 
operations. It is assumed that the data is in the extended- “precision 
floating-point format. 


Floating-Point Addition 
In the case of two normalized numbers to be summed, let 


a = 1.5 = 01.1000000000000000000000000000000 x 29 
b = 0.5 = 01.0000000000000000000000000000000 x 2-1 


It is necessary to shift b to the right by one so that a and b have the same 
exponent. This yields 


b = 0.5 = 00.1000000000000000000000000000000 x 29 
Then 


As in the case of multiplication, it is necessary to shift the binary point one 
place to the left and to add one to the exponent. This yields 


Example 4-7. Floating-Point Subtraction 
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A subtraction is performed in this example. Let 


a = 01.0000000000000000000000000000001 x 20 
b = 01.0000000000000000000000000000000 x 2° 


The operation to be performed is a — b. The mantissas are already aligned 
because the two numbers have the same exponent. The result is a large 
cancellation of the upper bits, as shown below. 
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Floating-Poin Addition and Subtraction — 
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The result must be normalized. In this case, a left shift of 31 is required. The 
exponent of the result is modified accordingly. The result is 


rotatatateloteretatarstetetetet eta tatetataretatetete 


Example 4-8. Floating-Point Addition With a 32-Bit Shift 


This example illustrates a situation where a full 32-bit shift is necessary to 
normalize the result. Let 


Oo = 01.1111111111111111111111111111111 x 2127 
D = 10.0000000000000000000000000000000 x 2127 


The operation to be performed is a + b. 


Normalizing the result requires a left shift of 32 and a subtraction of 32 from 
the exponent. The result is. 


Example 4-9. Floating-Point Adaition/Subtraction and Zero 


When floating-point addition and subtraction are performed with a float- 
ing-point 0, the following identities are satisfied: 


at0=a(a+#0) 
0+0=0 
0 —a =— a (a # 0) 
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4.7 Normalization (NORM instruction) 


The NORM instruction normalizes an extended-precision floating-point 
number that is assumed to be unnormalized. Since the number is assumed 
to be unnormalized, no implied most significant nonsign bit is assumed. The 
NORM instruction executes the following three steps: 


1) Locates the most significant nonsign bit of the floating-point number. 
2) Left shifts to normalize the number. 
3) Adjusts the exponent. 


Given the extended-precision floating-point value a to be normalized, the 
normalization, norm ( ), is performed as shown in Figure 4—13. 


Figure 4-13. Flowchart for NORM Instruction Operation 


0 


k = # leading 
nonsignificant 
sign ile 


ign- pened oman 1 bit 
c (man) = o(man) << Ls a 
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Example 4-10. NORM Instruction 


Assume that an extended- isobetnteah cid contains the value 


When the normalization is searomicd ona anUiibe: assumed to be unnor- 
malized, the binary point is assumed to be 


This number is then sign extended one bit so that the mantissa contains 33 
bits. 


The intermediate result after the most significant nonsign bit is located and 
the shift jerbeabidl ches Is: 


The NORM instruction is useful for courting the AUIbeE of ae Zeros oF 

leading ones in a 32-bit field. If the exponent is initially zero, the absolute 

value of the final value of the exponent is the number of leading ones or 

zeros. This instruction is also useful for manipulating unnormalized float- 
. ing-point numbers. 
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4.8 Rounding (RND Instruction) 


The RND instruction rounds a number from the extended-precision float- 
ing-point format to the single-precision floating-point format. Rounding is 

/' similar to floating-point addition. Given the number a to be rounded, the fol- 
lowing operation is performed first. 


C = a(man) x 20(€XP) + (1 x 2a(exp)—24) 


Next, a conversion from extended-precision floating-point to single-preci- 
sion floating-point format is performed. Given the extended-precision float- 
ing-point value, the rounding, rnd( ), is performed as shown in Figure 4—1 4. 
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Figure 4-14. Flowchart for Floating-Point Rounding by the RND Instruction 


vs a(e@xp)— 24 


c (man) =c (man) < <1 
C (exp) = a (exp) + 1 


lfc (man) > 0, 
set c to most positive 
single-precision value. 

lfc (man) < 0, 

set c to most negative 

single-precision value. 


c = rnd(a) 
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4.9 Floating-Point-to-Integer Conversion (FIX Instruction) 


Floating-point to integer conversion, using the FIX instructions, allows ex- 
tended-precision floating-point numbers to be converted to single-precision 
integers in a single cycle. The floating-point to integer conversion of the 
value x is referred to here as fix(x). The conversion does not overflow if o, 
the number to be converted, is in the range 


~231<q<231-] 
First, you must be certain that _ 
a(exp) < 30 


lf these bounds are not met, an overflow occurs. If an overflow occurs in the 
positive direction, the output is the most positive integer. If an overflow oc- 
curs in the negative direction, the output is the most negative integer. If 
a(exp) is within the valid range, then a(man), with implied bit included, is 
sign-extended and right-shifted (rs) by the amount 


rs = 31 — a(exp) 


This right shift (rs) shifts out those bits corresponding to the fractional part 
of the mantissa. For example: 


lfO <x <1, then fix(x) = 
If-1 <x <0, then fix(x) =—1. 


The flowchart for the floating-point to integer conversion is shown in 
Figure 4—15. 
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Floating-Point-to- Integer Conversion (FIX Instruction) 


Figure 4-15. Flowchart for Floating-Point-to-Integer Conversion by FIX Instructions 


it i (mar) > 0, 
ve _C=most positive integer 
tt (man) <0, eu 
J, Cz most t negative integer, 


Cc = fix(a) 
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4.10 Integer-to-Floating-Point Conversion (FLOAT Instruction) 


Integer to floating-point conversion, using the FLOAT instruction, allows 
single-precision integers to be converted to extended-precision float- 
ing-point numbers. The flowchart for this conversion is shown in 


Figure 4—16. 


Figure 4-16. Flowchart for Integer-to-Floating-Point Conversion by FLOAT Instructions 


a 
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k = # leading 
non-significant — 
sign bits © 


c (man) =c (man) <<k 


Cc (exp) = -128 
Cc (exp) = 30- k 
Remove most significant nonsign bit 


C = float (a) 
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4.11 Reciprocal (RCPF Instruction) 


The RCPF instruction generates a satisfactory estimate of the reciprocal of 
a floating-point number. The estimate has the correct exponent, and the 
mantissa is often accurate to the eighth binary position (mantissa error is 
thus < 2-8). Also, this estimate may be used as a seed for an algorithm to 
compute the reciprocal to even greater accuracy. (The Newton—Raphson 
algorithm, described in this section, is one such case.) 


Figure 4—17 below depicts the algorithm used by instruction RCPF. 

The input is assumed to be v= vman x 2LVeXP, 

The output is assumed to be x= xman x 2%©*P, 

vexp is negated. 

If vexp= —128, the result is saturated to the most positive number, and 
the overflow flag is set. The N condition flag is set to the same sign as 
vsign. 


Ooo d 


Figure 4-17. | RCPF Instruction Algorithm 
vexp vsign virac(22 .. 15) 


mf 


xfrac(22 .. 15) 


X@Xp xman 
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Reciprocal (RCPF Instruction) 


The look-up table is addressed by forming a nine-bit address consisting of 
vsign and bits 22—15 of vfrac. The eight-bit output of the lookup table is 
forms bits 22—15 of xfrac. Bits 14—0 of xfrac are set to zero. xsignis set to 
vsign. 

The lookup-table values are generated from simulation results. 


4.11.1 Reciprocal Algorithm 


The RCPF instruction provides the reciprocal of anumber. The estimate has 
| the correct exponent and a mantissa accurate to the eighth binary place 
(i.e., the error of the mantissa is < 2-8). The Newton—Raphson algorithm 
(shown below) may be used to further extend the manttissa’s precision: 


x[nt+1] = x[n]) (2 -— vx[n])) 


where v = the number whose reciprocal is to be found. 


x{0], the seed for the algorithm, is given by RCPF. For each iteration of the 
algorithm, the number of accurate bits in the mantissa doubles. Using 
RCPF, you can start with an estimate accurate to eight bits. With one itera- 
tion, accuracy is 16 bits inthe mantissa, and with a second iteration, accura- 
cy is 32 bits. 


The TMS320C4x program to implement this algorithm is shown in 
Figure 4—18. Each step of the algorithm is labeled along with the corre- 
sponding accuracy achieved at the end of the step. The algorithm takes only 
seven machine cycles. 


Figure 4-18. | Newton-Raphson Algorithm for Computing the Reciprocal 
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4.12 Reciprocal Square Root (RSQRF Instruction) 


The RSQRFF instruction generates an estimated reciprocal of the square 

root of a floating-point number. It parallels some of the operational charac- 

teristics of the RCPF instruction (Section 4.11) in that the RSQRF: 

() it generates an estimate (in this case the reciprocal of the square root 
of a floating-point number), 

[3 the mantissa is accurate to the eighth binary place (mantissa error is 
< 2-8), and 

() often, this is a satisfactory estimate of the reciprocal of a number’s 
square root; in other cases, it may be used as a seed for an algorithm 
that computes the reciprocal square root to an even greater accuracy. 4 

Figure 4—19 depicts the RSQRF algorithm. 

The input is assumed to be v= vman x 2VeXp. 

The output is assumed to be x= xman x 2XxexP. 

vexp + 1 is negated and shifted right one bit with sign extension. 

If vexp= —128, the result is saturated to the most positive number, and 

the overflow flag is set. 


OOO 


Figure 4-19. | RSQRF Instruction Algorithm 
vexp vexp/(0) virac(22.. 15) 


Xfrac(22 .. 15) 


X@XP xman 
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Reciprocal Square Root (RSQRFF Instruction) 


The look-up table is addressed by forming a nine-bit address consisting of 
the least significant bit of vexpandbits 22 — 15 of vfrac. The eight-bit output 
of the look-up table is used to form bits 22 — 15 of xfrac. Bits 14 —O of xfrac 
are set to zero. xsignis set to 0. There is no provision for negative values 
of v. 


The look-up-table values are generated from simulation results. 


Of course, given the result of this algorithm, division is performed by a sim- 
ple multiplication: y/v = yx{n] where x{n] is the estimate of 1/vas determined 
by the Newton—Raphson algorithm or an other algorithm. 


pa 4.12.1 Reciprocal Square Root Algorithm 


The RSQRF instruction provides the reciprocal of the square root of anum- 
ber. The estimate has the correct exponent and a mantissa accurate to the 
eighth binary place (i.e., the error of the mantissa is < 2-8). The Newton— 
Raphson algorithm (shown below) may be used to further extend the man- 
tissa’s precision: 


x[nt+1] = x[n] (1.5-(v/2) x[n] x[n]) 


where v = the number whose reciprocal is to be found. 


The seed for the algorithm, x{0], is given by RSQRF. For each iteration of 
the algorithm, the number of accurate bits in the mantissa doubles. Using 
RSQRF, you can start with an estimate having an accuracy to eight bits. 
With one iteration, accuracy is 16 bits in the mantissa, and with a second 
iteration, accuracy is 32 bits. 
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The TMS320C4x program to implement this algorithm is shown in 
Figure 4—20. Each step of the algorithm is labeled, and the corresponding 
accuracy achieved is noted at the end of the step. The algorithm takes only 
ten machine cycles (compared to 30 cycles on the ’C3x without a look-up 
table). 


4.12.2 Background on the Reciprocal Square Root 


In many applications, normalization of data values is necessary. Often, the 
normalizing factor is the square root of another quantity. For example, when 
one vector is given, the unit vector in the same direction as the original vec- 
tor can be found by normalizing the original vector by the length of the vector. 
This involves division by a square root. The ’'C40 provides a simple way to 
directly determine this quantity, instead of going through a two-step ap- 
proach of finding the square root and then finding the reciprocal of the 
square root. 


Of course, given the result of this algorithm, the square root is found by a 
simple multiplication: 


= vx 


where x{n] is the estimate of 1/V v as determined by the Newton— 
Raphson algorithm or some other algorithm. 
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Chapter 5 


Addressing 
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The TMS320C40 supports five groups of powerful addressing modes. Six 

types of addressing may be used within the groups, which facilitates access 

of data from memory, registers, and the instruction word. This chapter de- 

tails the operation, encoding, and implementation of the addressing modes. 

It also discusses the management of system stacks, queues, and copes 
in memory. The major topics in this chapter: 


Section Page 
5.1 Types of Addressing ............ cc cece cece eee eee eeees 5-2 
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5.1 Types of Addressing 


Five types of addressing allow access of data from memely: registers, and 
the instruction word: 


Sub— 
section Page 
[1 Register addressing 5.1.1 5-3 
{1 Direct addressing 5.1.2 5-4 
CL} Indirect addressing «1.3 5-5 
[1 Immediate addressing 5.1.4 5-17 
L} PC Relative addressing 5.1.5 5-17 


Some types of addressing are appropriate for some instructions and not 
others. Forthis reason, the types of addressing are used in the four different 
groups of addressing modes as follows: 


Sub- 
section Page 
(i General addressing modes (G): 5.2.1 5-19 
m@ Register 
@ Direct 
m@ Indirect 
= Immediate 
Li Three-operand addressing modes (T): 5.2.2 5-20 
m@ Register 
= Immediate 
@ Indirect 
(1 Parallel addressing modes (P): 5.2.3 5-23 
m@ Register 
@ Indirect 
[J Conditional-branch addressing modes (B): 5.2.4 5-24 
m Register 


m@ PC-relative 


The six types of addressing are discussed first (Subsections 5.1.1 through 
5.1.5, beginning onthe next page), followed by the five groups of addressing 
modes (section 5.2, beginning on page 5-19). 


Addressing 


5.1.1 Register Addressing 


In register addressing, a CPU register contains the operand, as shown in 
this example: | | : 


ABSF Rl , Rl = [R1| 


The syntax for the CPU registers, the assembler syntax, and the assigned 
function for those registers are listed in Table 5—1. 


Table 5-1. CPU Register/Assembler Syntax and Function 


Register Machine Assembler Assigned 
Value Syntax , Function 


Extended-precision register 
Extended-precision register 
Extended-precision register 
Extended-precision register 
Extended-precision register 
Extended-precision register 
Extended-precision register 
Extended-precision register 
Extended-precision register 
Extended-precision register 
Extended-precision register 
Extended-precision register 


Auxiliary register 0 
Auxiliary register 1 
Auxiliary register 2 
Auxiliary register 3 
Auxiliary register 4 
Auxiliary register 5 
Auxiliary register 6 
Auxiliary register 7 


Data-page pointer 
Index register 0 
Index register 1 
Block-size register 
Active stack pointer 


— Status register 
DMA coprocessor interrupt enable 
Internal interrupt enable register 
I1\OF pins and interrupt flag register 


Repeat start address | 
Repeat end address 
_| Repeat counter 
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5.1.2 Direct Addressing 


In direct addressing, the data address is formed by the concatenation of the 
16 least significant bits of the data page pointer (DP) with the 16 least signifi- 
cant bits of the instruction word (expr). This results in 65536 pages (64K 
words per page), giving you a large address space without requiring a 
change of the page pointer. The syntax and operation for direct addressing 
are listed below. 


Syntax: @expr | 
Operation: address = DP concatenated with expr 


Figure 5—1 shows the formation of the data address. Example 5—1 gives an 
instruction example with data before and after instruction execution. 


Figure 5-1. Direct Addressing 


Instruction | 
Word 


31 16 15 0 


31 
DP —> | 
(Data 


Page | 
Pointer) 34 16 15 0 


Example 5-1. Direct Addressing 
ADDI @OBCDEh, R7 


Before Instruction: After Instruction: 
DP = 108Ah | DP = 108Ah 
R7 = 11h R7 = 1234 5689h 


Data at 108A BCDEh = 12345678h Data at 108A BCDEh = 1234 5678h 
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5.1.3 Indirect Addressing 


Indirect addressing is used to specify the address of an operand in memory 
through the contents of an auxiliary register, optional displacements, and in- 
dex registers. This arithmetic is performed by the auxiliary register arithme- 
tic units (ARAUs) and is unsigned. (All 32 bits of the auxiliary and index reg- 
isters are used in indirect addressing.) 


The flexibility of indirect addressing is possible because the ARAUs on the 
TMS320C40 are used to modify auxiliary registers in parallel with opera- 
tions within the main CPU. Indirect addressing is specified by a five-bit field 
in the instruction word, referred to as the mod field (in the left side of 
Table 5-2 on page 5-6 as well as in the examples that follow). A displace- 
ment is either an explicit unsigned 5-bit or 8-bit integer contained in the in- 
struction word or an implicit displacement of one. Two index registers, IRO 
and IR1, can also be used in indirect addressing, enabling the use of 32-bit 
indirect displacements. In some cases, an addressing scheme using circu- 

lar or bit-reversed addressing is optional. The mechanism for generating ad- 

dresses in circular addressing is discussed in Section 5.3, bit-reversed in 

Section 5.4. 


Table 5—2 lists the various kinds of indirect addressing, along with the value 
of the modification (mod) field, assembler syntax, operation, and function 
for each. The succeeding 18 examples show the operation for each kind of 
indirect addressing. 
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Table 5-2. _ Indirect Addressing 


ModField | Syntax | Operation Description 


Indirect Addressing with Displacement 


00000 *+ARn(disp) addr = ARn + disp With predisplacement add 
00001 *_ ARn(disp) addr = ARn — disp With predisplacement subtract 


00010 ‘++ARn(disp) | S0ur= Ann + ak 


00011 *. — ARn(disp) aoa = Aa 7 dso With predisplacement subtract and modify 
| 00100 “AR ' addr = ARn , ; 
n++(disp) ARn = ARn + disp With postdisplacement add and modify 
001 01 *AR ° addr = ARn ° . ; 
n —— (disp) ARn = ARn - disp With postdisplacement subtract and modify 
5 ; addr = ARn With postdisplacement add and circular 
5 00110 ARn++(disp)% | ARn = circ(ARn + disp) | modify 
, F ..\o,{ add = ARn With postdisplacement subtract and 
ARn — — (disp)%|_ arn = circ(ARn disp) | circular modify 


Indirect Addressing with Index Register IRO 


01000 *+ARn(!IRO) addr = ARn + IRO With preindex (IRQ) add 
01001 *~ ARn(iRO) addr = ARn — IRO With preindex (IRO) subtract 


01010 pera dapat With preindex (IR0) add and modify 
01011 ped = Aa = oe With preindex (IRO) subtract and modify 
01100 paella With postindex (IRO) add and modify 
01101 coe eer With postindex (IR) subtract and modify 
a aiiG addr =  . vine With oe (IRO) add and circular 


‘ o, | addr = ARn With postindex (IRO) subtract and circular 
oni ARn —— (IR0)%) ARn=circ(ARn) -IRO | modify 


With predisplacement add and modify 


LEGEND 
addr = memory address 
ARn = auxiliary register ARO — AR7 
IRn = index register IRO or IR1 
disp = displacement (5 bits or 8 bits on ’C40) 
++ = add and modify 
-- = subtract and modify 
circ( ) = address in circular addressing 
% = where circular addressing is performed 
5-6 Addressing 
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[Mod Field | Syntax | Operation | Description 
addr = AR + Ri 
[10001 [== ARn(Rt) | eddr= ARin— IR 


= set 
seta 
oe 
Tee wes En 
= 
= 


Ps addr = ARn With postindex (IR1) add 
TOTO ARn ++(IR1)% | ARn = circ(ARn + IR1) and circular modify 

‘ Z addr = ARn With postindex (IR1) subtract 
vor" ARn-—(IR1)% | aRns= circ(ARn — IR1) and circular modify 


Indirect Addressing (Special Cases) } 
+1000 addr = ARN Tindvect 


* addr = ARn With postindex (IRO) add 
1001 ARn ++ (IRO)B | ARn = B(ARN + IRO) and bit-reversed modify 


LEGEND: 
addr = memory address 
ARn = auxiliary register ARO — AR7 
IRn = index register IRO or IR1 
disp = displacement 
++ = add and modify 
cia = subtract and modify 
circ( ) = address in circular addressing 
% = where circular addressing is performed 
B = where bit-reversed addressing is performed 
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Example 5-2. Auxiliary Register Indirect 


An auxiliary register (ARn) contains the address of the spersnd to be 
fetched. | 


Operation: Operand address = ARn 
Assembler Syntax: *“ARn 
Modification Field: 11000 


Example 5-3. Indirect With Predisplacement Add 


The address of the operand to be fetched is the sum of an auxiliary register 
(ARn) and the displacement (disp). The displacement is either a 5-bit or 8-bit 


unsigned integer contained in the instruction word or an implied value of 1 
Operation: operand address = ARn+ disp 


Assembler Syntax: “+ARn(disp) 
Modification Field: 00000 


31 


7 4 “9 
8-bit or 5-bit unsigned integer displacement { aS 


<< 
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Example 5-4. Indirect With Predisplacement Subtract 


The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn) minus the displacement (disp). The displacement is either an 
8-bit unsigned integer contained in the instruction word or an implied value 


of 1. 
Operation: operand address = ARn- disp 
Assembler Syntax: *— AR nidisp) 
Modification Field: 00001 
31 0 


disp |. 


Example 5-5. Indirect With Predisplacement Add and Modify 


The address of the operand to be fetched is the sum of an auxiliary register 
(ARn) and the displacement (disp). The displacement is either an 8-bit un- 
signed integer contained in the instruction word or an implied value of 1. Af- 
ter the data is fetched, the auxiliary register is updated with the address gen- 


erated. 
Operation: operand address = ARn+ disp 
| ARn= ARn+ disp 
Assembler Syntax: — *+4 ARn(disp) 
Modification Field: 00010 
31 0 


disp 
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Example 5-6. Indirect With Predisplacement Subtract and Modify 


The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn) minus the displacement (disp). The displacement is either an 
8-bit unsigned integer contained in the instruction word or an implied value 
of 1. After the data is fetched, the auxiliary register is updated with the ad- 


dress generated. 
Operation: operand address = ARn-— disp 
ARn= ARn- disp 
Assembler Syntax: *_ — ARn(disp) 
Modification Field: 00011 


31 0 


disp : 


Example 5-7. Indirect With Postdisplacement Add and Modify 


The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the displacement (disp) is added 
to the auxiliary register. The displacementis either an 8-bit unsigned integer 
contained in the instruction word or an implied value of 1. 


Operation: operand address = ARn 
| ARn= ARn+ disp 
Assembler Syntax: *ARn++ (disp) 
Modification Field: 00100 
31 0 


disp 
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Example 5-8. Indirect With Posidisplacement Subtract and Modify 


The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the displacement (disp) is sub- 
tracted from the auxiliary register. The displacement is either an 8-bit un- 
signed integer contained in the instruction word or an implied value of 1. 


Operation: operand address = ARn 

ARn= ARn — disp 
Assembler Syntax: *ARn — — (disp) 
Modification Field: 00101 


31 


disp 


Example 5-9. Indirect With Postdisplacement Add and Circular Modify 


The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the displacement (disp) is added 
to the contents of the auxiliary register using circular addressing. This result 
is used to update the auxiliary register. The displacement is either an 8-bit 
unsigned integer contained in the instruction word or an implied value of 1. 


Operation: operand address = ARn 

ARnz= circ(ARn+ disp) 
Assembler Syntax: *ARn ++ (disp)% 
Modification Field: 00110 


31 0 


disp S 
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Example 5-10. Indirect With Postdisplacement Subtract and Circular Modify 


The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the displacement (disp) is sub- 
tracted from the contents of the auxiliary register through circular address- 
ing. This result is used to update the auxiliary register. The displacement is 
either an 8-bit unsigned integer contained in the instruction word or an im- 
plied value of 1. 


Operation: operand address = ARn 

ARn= circ(ARn-— disp) 
Assembler Syntax: *ARn-—-— (disp)% 
Modification Field: 00111 


disp 


Example 5-11. Indirect With Preindex Add 


The address of the operand to be fetched is the sum of an auxiliary register 
(ARn). and an index register (IRO or IR1). 


Operation: operand address = ARn+ IR m 
Assembler Syntax: *+ ARn(IRm) 
Modification Field: 01000 ifm=0 

10000 ifm = 1 


ARn 
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Example 5-12. Indirect With Preindex Subtract 


The address of the operand to be fetched is the difference between an auxil- 
lary register (ARn) and an index register (IRO or IR1). 


Operation: operand address = ARn— |IRm 
Assembler Syntax: *—- ARn(IRm) 
Modification Field: 01001 ifm =0 

10001 if m =1 


IRm 


Example 5-13. Indirect With Preindex Add and Modify 


The address of the operand to be fetched is the sum of an auxiliary register 
(ARn) and an index register (IRO or IR1). After the data is fetched, the auxil- 
iary register is updated with the address generated. 


Operation: operand address = ARn+ IRm 
| ARn= ARn+ IRm 
Assembler syntax: *++ ARn(IRm) 
Modification Field: | 01010 ifm =0 
10010 if m =1 


31 0 


IRm— 
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Example 5-14. Indirect With Preindex Subtract and Modify 


The address of the operand to be fetched is the difference between an auxil- 
lary register (ARn) and an index register (IRO or IR1). The resulting address 
becomes the new contents of the auxiliary register. 


Operation: operand address = ARn-|IRm 
ARn= ARn-|IRm 

Assembler Syntax: *— — ARn(|IRm) 

Modification Field: 01011 ifm =0 


10011 if m =1 


Example 5-15. Indirect With Postindex Add and Modify 


The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the index register (IRO or IR1) is 
added to the auxiliary register. 


Operation: operand address = ARn 
ARn=ARn-+ |IRm 
Assembler Syntax: *“ARn++ (IR m) 


Modification Field: 01100 ifm =0 
, 10100 ifm = 1 


31 


iRm 
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Example 5-16. Indirect With Postilndex Subtract and Modify 


The address of the operand to be fetched is the contents of an auxiliary reg- 


ister (ARn). After the operand is fetched, the index register (IRO or IR1) is 
subtracted from the auxiliary register. 


Operation: operand address = ARn 
ARn= ARn—-Rm 

Assembler Syntax: *ARn—— (IRm) 

Modification Field: 01101 ifm = 0 
10101 ifm =1 


iRm 


Example 5-17. Indirect With Postindex Add and Circular Modify 


The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the index register (IRO or IR1) is 
added to the auxiliary register. This value is evaluated using circular ad- 
dressing and replaces the contents of the auxiliary register. 


Operation: operand address = ARn 
ARn= circ(ARn+ |IRm) 

Assembler Syntax: *“ARn++ (IRM™)% 

Modification Field: 01110 ifm =0 


10110 ifm =1 


IRm 
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Example 5-18. Indirect With Postindex Subtract and Circular Modify 


The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the index register (IRO or IR1) is 
subtracted from the auxiliary register. This result is evaluated through circu- 
lar addressing and replaces the contents of the auxiliary register. 


Operation: 


Assembler Syntax: 
Modification Field: 


iIRm 


operand address = ARn 
ARn= circ(ARn—|IRm) 
*ARn——-(IR m)% 

01111 ifm= 0 
10111 ifm =1 


Example 5-19. Indirect With Postindex Add and Bit-Reversed Modify 


The address of the operand to be fetched is the contents of an auxiliary reg- 
ister (ARn). After the operand is fetched, the index register (IRO) is added 
to the auxiliary register. This addition is performed with a reverse-carry prop- 
agation and can be used to yield a bit-reversed (B) address. This value re- 
places the contents of the auxiliary register. 


Operation: 


Assembler Syntax: 


Modification Field: 
31 


operand address = ARn 
ARn=B(ARn+IRO0) 
*ARn++(IRO)B 

11001 
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5.1.4 Immediate Addressing 


In immediate addressing, the operand is a 16-bit immediate value 
contained in the 16 least significant bits of the instruction word (expr). De- 
pending upon the data types assumed for the instruction, the immediate op- 
erand may be a twos-complement integer, an unsigned integer, or a floating- 
point number. This is the syntax for this mode: 


Syntax: expr 
Example 5~20 gives an instruction Cael with before- and after-instruc- 
tion data. 


Example 5-20. Immediate Addressing 


Instruction Before Instruction: After Instruction: 

SUBI 1,RO0 RO = Oh RO = 00 FFFF FFFFh 
LDI OF FFFh, RO RO = 0h RO=OO0FFFFFFFFh 
LDF 5.0,R0 RO = 0h RO = 02 2000 0000h 

OR  OFFFFh, RO RO = 0h RO = 00 0000 FFFFh 


5.1.5 PC-Relative Addressing 


PC-relative addressing is used for branching. Instructions of this type in- 
clude Bcond, BconaD, BcondAF, BcondAT, DBcondand DBcondD (repeat 
block), and LAJ (link and jump). It replaces the value of the PC with the con- 
tents of the 16 or 24 least significant bits of the instruction word. The assem- 
bler takes the src (a label or address) specified by the user and generates 
a displacement. If the branch is a standard branch, this displacement is 
equalto[label — (PC +1)]. Ifthe branch is adelayed branch, this displace- 
ment is equal to [label— (PC + 3)]. 


The displacement is stored as a 16-bit signed integer in the least significant 
bits of the instruction word. 


Syntax: expr 
Example 5—21 gives an instruction example with before- and after-instruc- 
tion data. 


Example 5-21. PC-Relative Addressing 


BU NEWPC ; pc=l1,NEWPC= 5,displacement= 3 
Before Instruction: After Instruction: 
PC = th | PC= 5h 
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The 24-bit addressing mode is used to encode the program control instruc- 
tions (e.g., BR, BRD, CALL, RPTB, RPTBD, LAJ). Depending on the in- 
struction, the new PC value is derived by adding a 24-bit signed value inthe 

instruction word with the present PC value. Bit 24 determines the type of 
branch (D=0 for a standard branch or D=1 for a delayed branch). Some of 
these instructions are encoded in Figure 5—2. 


Figure 5-2. Encoding for 24-Bit PC-Relative Addressing Mode 


(a) BR, BRD: unconditional branches (delayed and not delayed) 
31 25 23 | 0 


foo OOCOCOCSC“‘S;SS™SCOCOC~C~™S 


(b) = CALL: unconditional subroutine call 


31 23 | 7 7 0 
101400010 src | 


(c) RPTB, RPRBD: repeat block (not delayed and delayed) 
31 | 23 0 


011110 0/D sre 


(d) LAJ: link and jump (return address in extended-precision register R11) 
31 7 | 


23 0 


5.2 Groups of Addressing Modes 


Six types of addressing (covered in Section 5.1, beginning on page 5-2) 
form these four groups of addressing modes: 
Subsection Page 


[C1 General addressing modes (G) 5.2.1 5-19 
[1 Three-operand addressing modes (T) 5.2.2 5-20 
[) Parallel addressing modes (P) 5.2.3 5-23 
[1 Conditional-branch addressing modes (B) | 5.2.4 5-24 
5.2.1 General Addressing Modes 


Instructions that use the general addressing modes are general-purpose in- 
structions, such as ADDI, MPYF, and LSH. Such instructions usually have 
this form: 


dst operation src > dst 


where the destination operand is signified by dst and the source operand 
by src; operation defines an operation to be performed with the general ad- 
dressing modes to specify certain operands. Bits 31 — 29 are zero, indicating 
general addressing mode instructions. Bits 22 and 21 specify the general 
addressing mode (G) field, which defines how bits 15 through 0 are to be 
interpreted for addressing the src operand. 


Options for bits 22 and 21 (G field) are as follows: 


0 0 __ register (all CPU registers unless specified otherwise) 


0 1 direct 
1 0 __ indirect 
1 1 immediate 


If the src and dst fields contain register specifications, the value in these 
fields contains the CPU register addresses as defined by Table 5—1. For the 
general addressing modes, the following values of ARn are valid for indirect 
addressing: 


ARn,O < ns 7 


Figure 5-3 shows the encoding for the general addressing modes. The no- 
tation modn indicates the modification field that goes with the ARn field. Re- 
fer toTable 5—2 for further information. 
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Figure 5-3. Encoding for General Addressing Modes 


68 29 28 23 22 21 20 16 15 11 10 87 54 0 
[oo 0] operation Jo o[ at [ooooooc0000| se 
To 0 | operation [o 1] dst [~~~ rect, SS 
[oo of operation | [ast | modn [ARn | dep 
To 00] operation [ot] dst | ——S~—SCSmedinte 


| G | Destination | Source Operands | 


5.2.2 Three-Operand Addressing Modes © 


The 19 three-operand instructions on the ’C40 use the eight address forms 


= listed in Table 5—3: 
45 
Table 5-3. 


Three-Operand Instruction Address Forms 


Type 1T 


TT | srot addressing modes | _ere@addressing modes | dat 
[00 [register mode (any CPU register) [register mode (any CPU register) | Ax _ 
[04 | ndrect mode (disp = 0,1, IRO, IR) [register mode (any CPU register) | Px 
[10 [register mode (any CPU register) indirect mode (disp = 0,1, IRO, IR) Px 
[11 [indirect mode (disp = 0,1, IRO, IR) [indirect mode (isp = 0, 1, IRO,IR1) | Fx 


Type 2T 
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srci addressing modes src2 addressing modes | dst¥ | 
00 | register mode (any CPU register) 8-bit signed immediate | RK | 


indirect mode *+ARn(5-bit unsigned 
displacement) 


register mode (any CPU register) 


40 indirect mode *+ARn(5-bit unsigned 
displacement) 
44 indirect mode *+ARn1(5-bit un- indirect mode *+ARn2(5-bit un- 
signed displacement) signed displacement) 
T The ’'C40 recognizes either type 1 or type 2 instructions; the 'C30 recognizes only type 1. 
+ Rx = any register in the CPU (primary) register file for the respective processor. 


8-bit signed immediate 


The object values differ for three-operand instructions, depending on the 

assembler used: 

L} the TMS320C3x assembler recognizes only type 1 formats and sets bits 
31-28 t0 00105 | 

Li} the TMS320C4x assembler recognizes both types and sets bits 31-28 
to 00102 for type 1 and to 00115 for type 2. 
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The ’C4x processor executes both types (1 and 2). The ’C30 executes only 
the type 1 format. The three-operand instructions MPYSHI3 and MPYUHI3 
are unique to the ’C40. 


All instructions except four can use all four of the type 2 address forms 
shown in Table 5-3. These exceptions, which can use only address forms 
2 and 4 in type 2, are the floating-point instructions ADDF3, CMPF3, 
MPYF3, and SUBFS. 


The remaining 15 three-operand instructions are ADDC3, ADDI3, ANDS3, 
ANDN3, ASH3, CMPI3, LSH3, MPYI3, MPYSHI3, MPYUHI3, OR3, 
SUBBS3, SUBI3, TSTB3, and XOR3. 


Note that the 3 can be omitted from a three-operand instruction mnemonic. 


Bits 22 and 21 specify the three-operand addressing mode (T) field, which 
defines howbits 15 — 0 are to be interpreted for addressing the srcoperands. 
Bits 15 — 8 define the src? address, and bits 7— 0 define the src2 address. 


Figure 5—4 and Figure 5-5 show. the encoding for ’C4x three-operand ad- 
dressing (the ’C30 recognizes only the format in Figure 5—4). The notation 
modm or modn indicate that the modification field goes with the ARm or 
ARn field, respectively. Refer to Table 5-2 (page 5-6) for further informa- — 
tion. | 


The 8-bit signed immediate value supports left shifts, right shifts, and 
memory increment and decrement operations. The immediate value is not 
available for floating-point operations. 


These instructions greatly help reduce code size, both assembled and com- 
piled. They also give noticeable performance improvements in DSP and 
other computationally intensive applications and general-purpose code. 
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Figure 5-4. Encoding for Type 1 Three-Operand Addressing Modes ('C30 and 'C40) 
31 28 27 23 22 21 20 1615 1312 11 10 5 4 32 0 


[oe te [mam [oe[ meee] et [ovo] va 
[oo to [emwraton To [ear | moon | A [ooo] wee 
A 
[oo 10 [oveaion [ss ae [mon [A 


| T | | sre1 | wes | 


Figure 5-5. Encoding for Type 2 Three-Operand Addressing Modes ('C40 Only) 
31 28 27 23 22 21 20 16 15 13 12 10 8 7 3 2 0 
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5.2.3 Parallel Addressing Modes 


Instructions that use parallel addressing, indicated by || (two vertical bars), 
allow for the greatest amount of parallelism possible. The destination oper- 
ands are indicated as d1 and d2, signifying dst? and dst2, respectively (see 
Figure 6—4). The source operands, signified by src? and src2, use the ex- 
tended-precision registers. The parallel epee to be performed is called 
operation. 


Figure 5-6. Encoding for Parallel Addressing Modes 
31 3029 262524 23 22 21 41918 1615 1110 87 32 


a ee 


| src3 | sro4 | 


The parallel addressing mode (P) field specifies how the operands are to fis) 
be used, i.e., whether they are source or destination. The specific relation- 
ship between the P field and the operands is detailed in the description of 
the individual parallel instructions (see Chapter 11). However, the operands 
are always encoded in the same way. Bits 31 and 30 are set to the value of 
10, indicating parallel addressing mode instructions. Bits 25 and 24 specify 
the parallel addressing mode (P) field, which defines how bits 21 — 0 are to 
be interpreted for addressing the src operands. Bits21 — 19 are used to de- 
fine the src? address, bits 18 — 16 to define the src2 address, bits 15 — 8the 
src3 address, and bits 7 — 0 the src 4 address. The notations modn and 
modm indicate which modification field goes with which ARn or ARm (au- 
xiliary register) field, respectively. The parallel addressing operands are 
listed below. 


src1= Rn (0</ns7 for extended-precision registers RO — R7) 
src2=Rn (0</n<7 for extended-precision registers RO — R7) 


d1 If 0, dst7 is RO. If 1, dst7 is R1. 
d2 If 0, dst2 is R2. If 1, dsi2 is R3. 
P O< P<3 | 

src3 indirect (disp = 0, 1, IRO, IR1) 
src4_ indirect (disp = 0, 1, IRO, IR1) 


As inthe three-operand addressing mode, indirect addressing in the parallel 
addressing mode allows for displacements of 0 or 1 and the use of the index 
registers (IRO and !R1). The displacement of 1 is implied and is not explicitly 
coded in the instruction word. 


In the encoding shown for this mode in Figure 5-6, if the src3 and src4 fields 
use the same auxiliary register, both addresses are correctly generated, but 
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only the value created by the src3field is saved in the auxiliary register spe- 
cified. The assembler issues a warning if you specifiy this condition is speci- 
fied by the user. 


5.2.4 Conditional-Branch Addressing Modes 


Instructions using the conditional-branch addressing modes (Bcond, 
BconaD, CALLcond, DBcond, and DBcondD) can perform a variety of con- 
ditional operations. Bits 31 — 27 are set to the value of 01101, indicating con- 
ditional-branch addressing mode instructions. Bit 26 is set to 0 or 1; the for- 
mer selects DBcond, the latter Bcond. Selection of bit 25 determines the 
conditional-branch addressing mode (B). If B = 0, register addressing is 
used; if B = 1, PC-relative addressing is used. Selection of bit 21 sets the 
type of branch: D = 0 for a standard branch or D = 1 for a delayed branch. 
The condition field(cond) specifies the condition checked to determine what 
action to take, i.e., whether or not to branch (see Table 11-8 on page 11-12 
for alist of condition codes). Figure 6-6 shows the encoding for conditional- 
branch addressing. 


Figure 5-7. Encoding for Conditional-Branch Addressing Modes 
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5.3 Circular Addressing 


Many algorithms, such as convolution and correlation, require the imple- 
mentation of a circular buffer in memory. In convolution and correlation, the 
circular buffer is used to implement a sliding window that contains the most 
recent data to be processed. As new data is brought in, the new data over- 
writes the oldest data. The key to the implementation of a circular buffer is 
the implementation of a circular addressing mode. This section describes 
the circular addressing mode of the TMS320C40. 


The block-size register (BK) specifies the size of the circular buffer. The bot- 
tom of the circular buffer is specified by the first 1 (one) bit (counting from 
the most significant bit to the least significant bit) in the lower 16 bits of the 
BK register, plus a user-selected auxiliary register (ARn). With the location 
of the first 1 bit specified as bit N, the address at the top of the buffer is re- 
ferred to as the effective base (EB) and is equal to bits 31 through (N+1) of 5 
ARn with bits N through 0 of EB being zero. , 


Figure 5-8 illustrates the relationships among the block-size register (BK), 
the auxiliary registers (ARn), the bottom of the circular buffer, the top of the 
circular buffer, and the index into the circular buffer. 
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Figure 5-8. Flowchart for Circular Addressing 
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Index 
Circular 
Addressing 
Algorithm 
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New 
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31 N+1N _ 0 
New Ee 
ARn 
LEGEND: 

ARn = auxiliary register n L = low-order bits 
BK = block-size register L’ = new low-order bits 
EB = effective base LSB = least significant bit 
H = high-order bits N = bit value 


5-26 7 Addressing 


Circular Addressing 


In circular addressing, index refers to the N LSBs of the auxiliary register 
selected, and step is the quantity being added to or subtracted from the 
auxiliary register. Follow these two rules when you use circular addressing: 


[} The step used must be less than or equal to the block-size. 


(3 The first time the circular queue is addressed, the auxiliary register must 
be pointing to an element in the circular queue. 


The algorithm for circular addressing is as follows: 


lf O < index + step < BK: 
index = index + step. 


Else if index + step 2 BK: 
index = index + step — BK. 


Else if index + step < 0: 


index = index + step + BK. 


Figure 5—9 shows how the circular buffer is implemented. It illustrates the 
relationship of the quantities generated and the elements in the circular 


buffer. 


Figure 5-9. Circular Buffer Implementation 
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Figure 5-10. 
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Figure 5-10 gives an example of the operation of circular addressing. As- 
suming that all ARs are four bits, let ARO = 00000, andBK= 01109 (block- 
size of 6). This example shows a sequence of modifications and the result- 
ing value of ARO. It also shows how the pointer steps through the circular 
queue with a variety of step sizes (both incrementally and decrementally). 


Circular Addressing Example 

*ARO ++ (5)% - ARO = 0 (Oth value) 

*ARO ++ (2)% - ARO = 5 (1stvalue) 

*ARO — — (3)% ; ARO = 1 (2nd value) 

*ARO++(6)% - ARO = 4° (3rd value) 

*ARO - —% ; ARO = 4° (4th value) 

*ARO - ARO = 3. (5th value) 

Value Data Address 

eas ; 
and > 
Element? 
oe = 
hard > {Element | 
it 9 : 
6 
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Figure 5-11. 


ARO — 


Figure 5-12. 


Circular addressing is especially useful for the implementation of FIR filters. 
Figure 5—11 shows one possible data structure for FIR filters. Note that the 
initial value of ARO points to h(N —1), and the initial value of AR1 points to 
x(0). Circular addressing is used in the TMS320C40 code for the FIR filter 
shown in Figure 5—12. 


Data Structure for FIR Filters 


Impulse Response 
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FIR Filter Code Using Circular Addressing 


* Initialization 


* 
LDI 
LDI 
LDI 

* 

k 

TOP LDF 
STF 
LDF 
LDF 

* 


* Filter 


RPTS 
MPYF3 
| | ADDF3 
ADDF 


STF 
B 


N, BK 
H, ARO 
X, AR1 


IN, R3 
R3, *AR1++% 


N - 1 


~me Veo Ve Ve Veo 


e 
? 


Load block size. 

Load pointer to impulse response. 
Load pointer to bottom of input 
sample buffer. 


Read input sample. 

Store with other samples. 
and point to top of buffer. 
Initialize RO. 

Initialize R2. 


Repeat next instruction. 


*ARO++%, *AR1++%, RO 


RO, R2,R2 
RO, R2 


TOP 


a 


e 
_ - 


Multiply and accumulate. 
Last product accumulated. 


Save result. 
Repeat. 
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5.4 Bit-Reversed Addressing 


Bit-reversed addressing on the TMS320C40 enhances execution speed 
and program memory for FFT algorithms that use a variety of radices. One 
auxiliary register points to the physical location of a data value. IRO specifies 
one-half the size of the FFT; e.g., the value contained in IRO must be equal 
to 2-1 where nis an integer and the FFT size is 2". When you add IRO to 
the auxiliary register by using bit-reversed addressing, addresses are gen- 
erated in a bit-reversed fashion. The largest index for bit reversed is OOFF 
FFFFh. 


To illustrate this kind of addressing, assume 8-bit auxiliary registers. Let 
AR2 contain the value 0110 00005 (960). This is the base address of the 
data in memory. Let IRO contain the value 0000 10005 (8). Figure 5—13 
shows a sequence of modifications of AR2 and the resulting values of AR2. 


Figure 5-13.  Bit-Reversed Addressing Example 


Table 5-4. 
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*AR2++ (IRO)B ; AR2 = 0110 0000 (0th value) 
*AR2++(IRO)B ; AR2 = 0110 1000 (list value) 
*AR2++(IRO)B ; AR2 = O110 0100 (2nd value) 
*AR2++(IRO)B ; AR2 = 0110 1100 (3rd value) 
*AR2++(IRO)B ; AR2 = OQ110 0010 (4th value) 
*AR2++(IRO)B ; AR2 = 0110 1010 (5th value) 
*AR2++(IRO)B ; AR2 = 0110 0110 (6th value) 
*AR2 ; AR2 = 0110 1110 (7th value) 


Table 5—4 shows the relationship of the index steps and the four LSBs of 
AR2. As you can see, you can find the four LSBs by reversing the bit pattern 


_ of the steps. 
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5.5 System and User Stack Management 


The TMS320C40 provides a dedicated system stack pointer (SP) for build- 
ing stacks in memory. The auxiliary registers can also be used to build a vari- 
ety of more general linear lists. This section discusses the implementation 
of the following types of linear lists: 


Stack A linear list for which all insertions and deletions are made at one 
end of the list. 


Queue _ A linear list for which all insertions are made at one end of the 
list, and all deletions are made at the other end. 


Dequeue A double-ended queue linear list for which insertions and dele- 
tions are made at either end of the list. 


The system stack pointer (SP) is a 32-bit register that contains the address 
of the top of the system stack. The system stack fills from low-memory ad- 5 
dress to high-memory address (see Figure 5—14). The SP always points to 

the last element pushed onto the stack. A push performs a preincrement, 

and a pop performs a postdecrement of the system stack pointer. 


The program counter is pushed onto the system stack on subroutine calls, 
traps, and interrupts. Itis popped from the system stack on returns. The sys- 
tem stack can be pushed and popped with the PUSH, POP, PUSHF, and 
POPF instructions. 


Figure 5-14. | System Stack Configuration 
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5.5.1 Stacks 


Stacks can be built from low to high memory or high to low memory. Two 
cases for each type of stack are shown. You can build stacks by using the 
preincrement/decrement and postincrement/decrement modes of modify- 
ing the auxiliary registers (AR). You can implement stack growth from high 
to low memory in two ways: 


Case 1: Store to memory using *—— ARn to push data onto the stack and 
reads from memory using *“ARn ++ to pop data off the stack. 


Case 2: Store to memory using *“ARn ——to push data onto the stack 
and read from memory using * ++ ARn to pop data off the stack. 


Figure 5-15 illustrates these two cases. The only difference is that in case 
1, the AR always points to the top of the stack, and in case 2, the AR always 
points to the next free location on the stack. 


Figure 5-15. Implementations of High-to-Low Memory Stacks 
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ARn —> Top of Stack 


Low Memory Low Memory 


. Free | ooaRn > 


Top of Stack 
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You can implement stack growth from low to high memory in two ways: 


Case 3: Store to memory using *++ ARn to push data onto the stack and 
reads from memory using *“ARn — — to pop data off the stack. 


Case 4: Storestomemory using *“ARn ++ to push data onto the stack and 
reads from memory using *— — ARn to pop data off the stack. 


Figure 5—16 shows these two cases. In the case 3, the AR always points to 
the top of the stack. In case 4, the AR always points to the next free location 
on the stack. 


Figure 5-16. Implementations of Low-to-High Memory Stacks 


Low Memory Low Memory 
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ARn —> 
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High Memory | High Memory 


(a) Case 3 (b) Case 4 


5.5.2 Queues and Dequeues 


The implementations of queues and dequeues is based upon the manipu- 
lation of the auxiliary registers for user stacks. For queues, two auxiliary 
registers are used: one to mark the front of the queue from which data is 
popped and the other to mark the rear of the queue where data is pushed. 


For dequeues, two auxiliary registers are also necessary. One is used to 
mark one end of the dequeue, and the other is used to mark the other end. 
Data can be popped or pushed from either end. 
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The TMS320C40 provides a complete set of constructs that facilitate soft- 
ware and hardware control of the program flow. Software control includes 
repeats, branches, calls, traps, and returns. Hardware control includes 
reset and interrupts. Because programming includes a variety of constructs, 
you can select the one suited for your particular application. 


several interlocked operations instructions provide flexible multiprocessor 
support and, through the use of external signals, a powerful means of 
synchronization. They also guarantee the integrity of the communication 
and result in a high-speed operation. 


The TMS320C40 supports a nonmaskable external reset signal and a 
number of internal and external interrupts. These functions can be pro- 
grammed for a particular application. 


This chapter discusses the following major topics: 
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6.1 Repeat Modes 


The repeat modes of the TMS320C40 can implement zero-overhead loop- 
ing. For many algorithms, most execution time is spent in an inner kernel 
of code. Using the repeat modes allows these time-critical sections of code 
to be executed in the shortest possible time. 


The TMS320C40 provides three instructions to support zero-overhead 
looping: RPTB, RPTBD (repeat a block of code/delayed) and RPTS (repeat 
a single instruction): 


CL] RPTB and RBTBD causes a block of code to be repeated a specified 
number of times, and 

Ci RPTS causes a single instruction to be repeated a number of times and 
reduces the bus traffic by fetching the instruction only once. 


Three registers (RS, RE, and RC) are associated with the updating of the 
program counter when it is updated in a repeat mode, as described in 
Table 6-1 below. 


| 6. Table 6-1. Repeat-Mode Registers 


fas is Lian 


>} the block remains to be opeated. 


6.1.1. Repeat-Mode Initialization 


Two bits are important to the operation of RPTB, RPTBD and RPTS: the 
RM and S bits. 


[The RM (repeat-mode flag) bit in the status register specifies whether 
the processor is running in the repeat mode. 


m If RM =O, fetches are not made in repeat mode. 
m If RM = 1, fetches are made in repeat mode. 


1 The §$ bit is internal to the processor and cannot be programmed, but 
this bit is necessary to fully describe the operation of RPTB and RPTS. 


m |fRM=1andS=0,RPTBorRPTBDis executing. Program fetches 
are from memory. 

m@ lf RM=1 and S = 1, RPTS is executing. After the first fetch (from 
memory), program fetches are from the instruction register (IR). 
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The correct operation of the repeat modes requires that all of the above reg- 
isters and status register fields be initialized correctly. The RPTB, RPTBD, 
and RPTS instructions perform this initialization in slightly different ways 
(see subsections 6.1.2 and 6.1.3). 


6.1.2 RPTB and RPTBD Initialization 


The execution sequence of RPTB srcor RPTBD src is nearly the same: 
1) Loads the start address of the block into RS (repeat start address regis- 
ter). 
a) For RPTB, this is the next address following the instruction: 
PC of RPTB +1—RS 
or 
b) For RPTBD, this is the fourth address following the instruction: 
PC of RPTBD +4-— RS 
2) Loads the end address of the block into RE (renee! end address regis- 
ter). 
a) In PC-relative mode, the 24-bit src operand plus RS is the end ad- 
dress: 


For RPTB, 6 
src+PC of RPTB +1—- RE 


or 
For RPTBD, 
src + PC of RPTBD + 3 — RE | 
b) In register mode, the contents of the srcregister is the end address: 
contents of src register > RE 
3) Sets the status register to indicate the repeat mode of aperaien: 
1 — RM status register bit (repeat mode flag) 
4) Indicates that this is the repeat block mode of operation. 
0 — S bit (bit is internal to processor; not programmable) 


The last bit of information required is the number of times to repeat the block. 
The value is determined by properly initializing the RC (repeat count) regis- 
ter. Because the execution of RPTB and RPTBD does not load the RC, you 
mustload this register yourself. A typical setup of the block repeat operation 
is S shown below. 


The cea cues veers a ‘block at cade at tisaste once in eoeean Aeon 
The repeat counter should be loaded with one less than the number of times 
to execute the block; i.e., an RC value of 0 executes the block of code one 
time, or an RC value of 4 would execute the block five times. All block re- 
peats initiated by RPTB or RPTBD can be interrupted. 
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6.1.3. RPTS Initialization | 


When FRPTS srcis executed, the following sequence of operations occurs: 
1) PC +1—RS 
2) PC +1—- RE 


3) 1 — RM status register bit 
4) 1— S bit 
5) src —> RC (repeat count register) 


‘The RPTS instuction loads all registers and mode bits necessary for the op- 


eration of the single instruction repeat mode. Step 1 loads the start address 
of the block into RS. Step 2 loads the end address into the RE (end address 
of the block). Since this is a repeat of a single instruction, the start address 
and the end address are the same. Step 3 sets the status register to indicate 
the repeat mode of operation. Step 4 indicates that this is the repeat single- 
instruction mode of operation. Step 5 loads src into RC. 


Repeats of a single instruction initiated by RPTS are not interruptible, be- 
cause the RPTS fetches the instruction word only once and then keeps it 
in the instruction register for reuse. An interrupt would cause the instruction 
word to be lost. The refetching of the instruction word from the instruction 
register reduces memory accesses and, in effect, acts as a one-word pro- 
gram cache. If it is necessary to have a single instruction that is repeatable 
and interruptible, you can use the RPTB instruction. 


6.1.4 Repeat-Mode Operation 
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Information in the repeat-mode registers and associated control bits is used 
to control the modification of the PC when the fetches are being made in re- 
peat mode. The repeat modes compare the contents of the RE register (re- 
peat end address register) with the program counter (PC). If they match and 
the repeat counter is nonnegative, the repeat counter is decremented, the 
PC is loaded with the repeat start address, and the processing continues. 
The fetches and appropriate status bits are modified as necessary. Note that 
the repeat counter (RC) is never modified when the repeat-mode flag (RM) 
is O. The maximum number of repeats occurs when RC = 0 8000 OOO1h. 
This will result in 0 8000 0001h repetitions. The detailed algorithm for the 
update of the PC is shown in Figure 6—1. 


The RPTB and RPTS are four-cycle instructions. These four cycles of over- 
head are incurred only on the first pass through the loop. All subsequent 
passes through the loop are accomplished with zero cycles of overhead. In 
Example 6-1, the block of code from STLOOP to ENDLOP is repeated six- 
teen times. 
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Example 6-1. RPTB Operation 


Figure 6-1. | Repeat-Mode Control Algorithm 


Using the repeat block mode of modifying the PC facilitates analysis of what 
would happen in the case of branches within the block. Assume that the next 
value of the PC will be either PC + 1 or the contents of the RS register. It is 
thus apparent that this method of block repeat allows branching within the 
repeated block. Execution can go anywhere within the user’s code via inter- 
rupts, subroutine calls, etc. For proper modification of the loop counter, the 
last instruction of the loop must be fetched. By writing a 0 into the repeat 
counter or writing 0 into the RM bit of the status register, you can stop 
the repeating of the loop prior to completion. 
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Since the block repeat modes modify the program counter, other instruc- 
tions cannot modify the program counter at the same time. Two rules apply: 
- Rule 1: The last instruction in the block (or the only instruction 
in a block of size one) cannot be a Bcond, DBcond, CALL, 
CALLcond, TRAPcond, RETIcond, RETScond, IDLE, RPTB, 
or RPTS. Example 6—2 shows an incorrectly placed standard 
branch. 


Rule 2: None of the last four instructions from the bottom of the 
block (or the only instruction in a block of size one) can be a 
Bcond D2 BRD, or DBcondD, RPTBD, LAJ, LAJcond, LAT cond, 
BcondAF, BcondaAT, or RETIcond. Example 6-3 shows an incor- 
rectly placed delayed branch. 


If either of these rules is violated, the PC will be undefined. 


Example 6-2. jeeiianen or elas Ele 
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/ This branch violates rule 20 


Block repeats (RPTB 2 aad | RPTBD) « are necibie. Sines: all at the éantel is 
defined by the RS, RE, RC, and ST registers, these registers must be saved 
and stored in order to nest block repeats. The status register RM bit can be 
used to determine whether the block repeat mode is active. For example, 
if you write an interrupt service routine that requires the use of RPTB or 
RPTBD, it is possible that the interrupt associated with the routine may oc- 
cur during another block repeat. The interrupt service routine can check the 
RM bit. If this bit is set, the interrupt routine saves RS, RE, RC, and ST. The 
interrupt routine can then perform a block repeat. Before returning to the in- 
terrupted routine, the interrupt routine restores RS, RE, RC, and ST. If the 
RM bit is not set, you don’t need to save and restore these registers. 
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6.2 Delayed Branches 


The TMS320C40 offers two main types of branching: standard and delayed. 
Standard branches empty the pipeline before performing the branch; this 
guarantees correct management of the program counter and results in a 
TMS320C40 branch taking four cycles. Included in this class are repeats, 
calls, returns, and traps. 


Delayed branches without annulling do not empty the pipeline but, rather, 
guarantee that the next three instructions will execute before the program 
counter is modified by the branch. Delayed branches with annulling may 
conditionally annul the next three instructions. The result is a branch that re- 
quires only asingle cycle, thus making the speed of the delayed branch very 
close to the optimal block repeat modes of the TMS320C40. However, un- 
like block repeat modes, delayed branches may be used in situations other 
than looping. Every delayed branch has a standard branch counterpart that 
is used when adelayed branch cannot be used. The delayed branches with- 
out annulling are BcondD, BRD, and DBconaD. Those with annulling are 
BcondAT and BcondAF. 


Conditional delayed branches use the conditions, reflected in the status reg- 6 
ister, that existed at the end of the instruction preceding the branch. They 

do not depend upon the instructions following the delayed branch. Delayed 

branches without annulling guarantee that the next three instructions will ex- 

ecute, regardless of other pipeline conflicts. 

When a delayed branch is fetched, it remains pending until the three 
instructions that follow are executed. None of the three instrutions im- 
mediately after a delayed branch can be any of the following (see 


Example 6-4): 
Bcond BRD IDLE RETIcondD 
BcondD DBcond LAJ RETScond 
BcondAFt DBcondD LAJcond RPTB 
BcondATT CALL LATcond RPTBD 
BR CALLcond RETIcond RPTS 


TRAPcond 
T BeondAF and BcondAT are described in Section 6.3 on page 6-9. 


Dela ayed Branches ee 
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Incorrectly used delayed branches can leave the PC undefined. 


Delayed branches disable interrupts until the three instructions following the 
delayed branch are completed. This is independent of whether or not the 
branch is taken. 


Example 6-4. Incorrectly Placed Delayed Branches 


true, but 
m™ BcondAT executes but annuls (cancels effect of — except for time 
delay) the execute phase of the next three instructions following 
-BcondaAT. Then it takes the branch. If condis false, execution con- 
tinues immediately after the BcondAT. 
™ BcondAF first executes the next three instructions following the 
BcondAF.Then it takes the branch. If condis false, execution con- 
tinues immediately after the BcondAF but the execution phase of 
the first three instructions are annulled. 


= | [1 The BcondAT and BcondAF instructs both branch if conditions are 
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6.3 Calls, Traps, Branches, Jumps and Returns 


Calls and traps provide a means of executing a subroutine or function while 
‘providing a return to the calling routine. 


The CALL, CALLcond, and TRAP cond instructions store the value of the 
PC on the stack before changing the PC’s contents. The RETScond or RE- 
Tlcond (standard or delayed) instructions return execution from traps and 
calls using the value on the stack. 

L) CALL places the next PC value on the stack and places the src(source) 
operand into the PC. The src is a 24-bit PC-relative or register value. 
Figure 6—2 shows CALL response timing. 

CL} CALLcond is similar to the CALL instruction (above) except that (1) it 
executes only if a specific condition is true (the 20 conditions — includ- 
ing unconditional — are listed in Section 11.2 on page 11-10 ) and (2) 
the src is either a PC-relative displacement or in register addressing 
mode. 

[1 TRAPcond also executes only if a specific condition is true (same con- 
ditions as for the CALLcond instruction). When it executes, (1) inter- 
rupts are disabled with 0 written to bit GIE of the ST, (2) the next PC 
value is stored on the stack, and (3) avector is retrieved from one of the 
addresses from 20h to 3Fh and loaded into the PC. The particular ad- 
dress corresponds to a trap number in the instruction. Using RETIcond 
or RETiconaD to return re-enables interrupts if the status register’s GIE 
bit was set previously. 

Li RETScond returns execution from any of the above three instructions 
by popping the top of the stack to the PC. For RETScond to execute, 
the specified condition must be true. Conditions are the same as for the 
CALLcond instruction. 

1 RETIcondreturns from traps or calls in the same way as the RETScond 
(above) does with the addition that RETIcond also copies the PGIE and 
PCF bit values into the GIE and CF bits of the status register. Conditions 
are the same as for the CALLcond instruction. 

[4 RETlIcondD returns from traps or calls the same way as the RETIcond 
(above) does with the addition that RETIlcondD also first executes the 
next three instructions immediately following the RETIcondD. Condi- 
tions are the same as for the CALLcond instruction. 

[J Link and jump (LAu), link and jump conditional (LAJcond), and link and 
trap conditional (LATcond) each provide a return address in extended- 
precision register R11. 

mM After it executes the three instructions that follow it, LAJ jumps to 
an address derived by the concatenation of the most significant 8 
bits of the PC and the 24-bit src address in the instruction. 


6-9 


Calls, _Jraps, Branches, Jumps and Returns Do yeesesten 


a 2ataeeaea Neate Mes taR Na Na taPaN eta NataMteRatahetetatatatatattetatatatetatatetstetirets PPO NMS PNM N ana et ee tetenctatatatataeatetataNatetatatatotatetetatataatateetetatetetetatetetatetansttaretataratatesetaratatattatit rats tere tatatatenate esata etaratetetntatatatstare ctanstatstatstatitetatatelutatasesntatetessseteseteteteietettets 


m LAJcond destination address is either PC-relative (a displace- 
ment) or the contents of a specified register. If the condition is true, 
LAJcondfirst executes the three instructions following the LAJcond 
before making the jump. If the condition is not true, execution con- 
tinues immediately after the LAJcond instruction. 

m™ After it executes the three instructions that follow it, LATcond calls 
one of the 512 available trap vectors pointed to by the trap vector 
table pointer (TVTP) in Section 3.2 on page 3-15. The vector value 
is loaded into the PC. 

Functionally, calls and traps accomplish the same task: namely, a subfunc- 

tion is called and executed, and control is then returned to the calling func- 

tion. Traps offer several advantages: 

1) Interrupts are automatically disabled when a trap is executed. This al- 
lows critical code to execute without risk of being interrupted. Thus, 
traps are usually terminated with a RETIcondor RETIcondD instruction 
to re-enable interrupts if the status register GIE bit was set previously. 

2) Youcanuse traps to indirectly call functions. This is particularly benefi- 
cial when a kernel of code contains the basic subfunctions to be used by 
applications. In this case, the functions in the kernel can be modified 
and relocated without recompiling each application. 


Figure 6-2. | CALL Response Timing 


Store PC Insiruction 


| Fetch CALL | | | on Stack 
H3 


HNO NS NSF NS NS NS 
ADDR Vector Address First Instruc 
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6.4 Unifying Traps and Interrupts 


Traps and interrupts on the TMS320C4x are unified in all forms of operation 
except initialization. 


6.4.1 Initialization 


initialization: | 

Traps are always triggered by a software mechanism, either with 
TRAP cond (conditional trap) or LATcond (link and trap conditionally 
delayed). 

Interrupts are triggered by hardware events (i.e., external interrupts, 
DMA interrupts, or communication channel interrupts). 


6.4.2 Operation 


Figure 6—3 shows the unified flow of traps and interrupts. 


For an interrupt, step (1) in the figure happens after completion of the last 
instruction that was fetched before completion of the interrupt flush. This 
guarantees later restoration of correct flag values. 


Figure 6-3. Unified Flow of Traps and Interrupts 


Trap Executed 
Interrupt Received (TRAP cond or LATconda) 


1 


2 


Return Executed 
(RETIcond or RETIcon 
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LAT cond (link and trap conditionally) is a delayed instruction that provides 
a single-cycle trap that is very useful for error detection and correction. 
Since LATcond is a delayed instruction, the three instructions following 
LAT cond should not modify the GIE or CF status register bits (this could re- 
sult in storing incorrect values of these two bits). 


The RETIcond and RETIcondD instructions manipulate the status flags as 
shown in step (3) in the figure. RETIcondD provides a delayed return from 
a trap or interrupt. Since traps and interrupts are unified, the RETIcond pro- 
vides a return from either. 


In general, you should not directly modify the PGIE or PCF status register 
bits except when putting the status register on a stack for recursive inter- 
rupts or traps. 
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6.5 Interlocked Operations 


One of the most common parallel processing configurations is the sharing 

of global memory by multiple processors. In order for multiple processors 

to access this global memory and share data in a coherent manner, some 
sort of arbitration or handshake is necessary. The TMS320C40 interlocked 
operations meet this requirement for arbitration. More details are given in 

Section 7.7 on page 7-39. Examples in this section show you how inter- 

locked operations can be used to implement: 

[C1 Abusy-waiting loop used to synchronize processors at the software lev- 
el (Example 6—5, page 6-15), 

[1 Acounter shared between cooperative processors defining the number 
of times a task should be done by the processors (Example 6-6 on page 
6-15), 

[_} Semaphores used to ease the programming of critical sections 
(Example 6—7 and Example 6-8 on page 6-16). 


The TMS320C40 has five instructions referred to as interlocked operations. 
Through the use of external signals, these instructions provide powerful 
synchronization mechanisms. They also guarantee the integrity of the com- 
munication and result in a high-speed operation. The interlocked-operation 
instruction group is listed in Table 6-2. 


Table 6-2. _Interlocked Operations 


Casesion | _tesipion | _Saon 


Store floating: pail valle om a register to memory, eras det. 
interlocked when external ae Os accessed | Clear ir interlock 


The interlocked operations use the global- and local-bus signals, LOCK and 
LLOCK, to reflect a currently executing interlocked operation. This signal is 
active (low) when any of the interlocked instructions in Table 6-2 are ex- 
ecuting. 
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The external timing for the interlocked loads and stores is the same as for 
standard load and stores. You can extend the interlocked loads and stores 
like standard accesses by using the appropriate ready signal (RDYx or 
LRDYx). 


The LDFI and LDII instructions perform the following actions: 

1) Pull (L)LOCK low. 

2) Execute an LDF or LDI instruction. 

3) Extend the read cycle until the appropriate ready signal is received. 
Complete the instruction. 

4) Leave (L)LOCK active low until changed by an STFI, STIl, or SIGI. 


The read/write operation is identical to any other read/write cycle except for 
the special use of (L)LOCK. The src operand for LDFI and LDII is always 
a direct or indirect memory address. (L)LOCK is set to 0 only if the srcis lo- 
cated off-chip (i.e., STRB or LSTRB, is active). If on-chip memory is ac- 
cessed, then (L)LOCK is not asserted, and the operation is as an LDF or LDI 


from internal memory. 


The STFI and STIl instructions perform the following operations: 

1) Begin awrite cycle. The state of (L)LOCK does not change. If it is low, 
an interlocked operation occurs. If high, the operation is as if an STF or 
STI is performed (not interlocked). | 

2) Execute an STF or STI instruction and extend the write cycle until the 
appropriate ready is signaled. | 

3) After the write cycle, bring (L)LOCK inactive (high). 


As in the case for LDFI and LDII, the dstof STFI and STII affects (L)LOCK. 
If dstis located off-chip (STRB(0,1) or LSTRB(0, 1) is active), (L)LOCK is set 
to a 1. If on-chip memory is accessed, then (L)LOCK is not asserted, and 
the operations are as a STF or STI to internal memory. 


The SIGI instruction functions as follows: 

1) Pulls (L)LOCK low. 

2) Executes an LDI instruction. 

3) Extends the read cycle until the appropriate ready signal is received. 
Completes the instruction. 

4) Brings (L)LOCK back inactive high. 


Interlocked operations can be used to implement a busy-waiting loop, to 
manipulate a multiprocessor counter, to implement a simple semaphore 
mechanism, or to perform synchronization between two TMS320C40s. The 
following examples illustrate the usefulness of the interlocked operations in- 
structions. 
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Example 6-5 shows the implementation of a busy-waiting loop. If location 
LOCK is the interlock for a critical section of code, and a nonzero means the 
lock is busy, the algorithm for a busy-waiting loop can be used as shown. 


Example 6-5. Busy-Waiting Loop 


Example 6-6 shows how a location COUNT may contain a count of the 
number of times a particular operation needs to be performed. This opera- 
tion may be performed by any processor in the system. If the count is zero, 
the processor waits until it is nonzero before beginning processing. The ex- 
ample also shows the algorithm for modifying COUNT correctly. 


Example 6-6. Task Counter Manipulation 


Figure 6—4 illustrates multiple TMS320C40s sharing global memory and 
using the interlocked instructions as in Example 6—7 and Example 6-8. 
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Figure 6-4. Multiple TMS320C40s Sharing Global Memory 
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Sometimes it may be necessary for several processors to access some 
shared data or other common resources. The portion of code that must ac- 
cess the shared data is called a critical section. 


To ease the programming of critical sections, semaphores may be used. 
Semaphores are variables that can take only nonnegative integer values. 
Two primitive, indivisible operations are defined on semaphores (with S be- 
ing a semaphore): 


V(S): oe a aS 
P(S): P: if (S == 0), go to P 
| else S-1 055 


Indivisibility of V(S) and P(S) means that when these processes access and 
modify the semaphore S, they are the only processes doing so. 


To enter a critical section, a P operation is performed on acommon sema- 
phore, e.g., S (S is initialized to 1). The first processor performing P(S) will 
be able to enter its critical section. All other processors are blocked because 
~Shas become 0. After leaving its critical section, the processor performs a 
V(S), thus allowing another processor to execute P(S) successfully. 


The TMS320C40 code for V(S) is shown in Example 6—7, and code for P(S) 
is shown in Example 6-8. Compare the code in Example 6—8 to the code 
in Example 6-6. 
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6.6 Reset Operation 


The TMS320C40 supports a nonmaskable external reset signal (RESET), 
which is used to perform system reset. This section discusses the reset op- 
eration. 


At powerup, the state of the TMS320C40 processor is undefined. You can 
use the RESET signal to place the processor in a known state. This signal 
must be asserted low for 10 or more H1 clock cycles to guarantee a system 
reset. H1 is an output clock signal generated by the TMS320C40 (see Chap- 
ter 13 for more information). 


Reset affects the other pins on the device in either a synchronous or 
asynchronous manner. The synchronous reset is gated by the 
TMS320C40s internal clocks. The asynchronous reset directly affects the 
pins, and it is faster than the synchronous reset. Table 6-3 shows the state 
of the TMS320C40s pins after RESET = 0. Each pin is described according 
to whether the pin is reset synchronously or asynchronously. 


| 6 Table 6-3. Pin Operation at Reset 


Synchronous reset. Set to one. 
Synchronous reset. Set to one. 
Synchronous reset. Set to one. 


Reset has no effect 


Table Continued on Next Page 
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Table 6-3. Pin Operation at Reset (Continued) | 


Lo@i-0) | 82 [VOT 
TooE | 1 [1 | Recethasnoefect SSS 


LA(30 — 0) 3 Synchronous reset. Placed in high-impedance state. 
LAE 1 | Reset has no effect. | | 
LSTAT(3 — 0 /O | Synchronous reset. Set to all ones. 
vO Synchronous reset. Set to one. 

Synchronous reset. Set to one. 


O 
Synchronous reset. Set to one. 
O 


mirmicie 
UliDi| Oi) 
=| siz 
QO} =i Do 
mi Clol~x 
oO oO 


Synchronous reset. Set to zero. 
st | Reset has no effect. 
ae Reset has no effect. 
Synchronous reset. Set to one. 
Synchronous reset. Set to one. 
Synchronous reset. Set to zero. 
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LRDY1 | ed Reset has no effect. 
LCE1 | Reset has no effect. | | 


Synchronous reset. Placed in high-impedance state. 
Asynchronous reset. Placed in high-impedance state. 
| | Asynchronous reset. Placed in high-impedance state. 

Asynchronous reset. Placed in high-impedance state. 
Asynchronous reset. Placed in high-impedance state. 


RDYO 


Table Continued on Next Page 
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Table 6-3. Pin Operation at Reset (Continued) | 


CRDY1 1 Asynchronous reset. Placed in high-impedance state 


“Co 


Communication Port 3 interface(12pins) 


| CACK3 | Asynchronous reset. Placed in high-impedance state 


CREQ3 | 
CACK3 | 
} 


Communication Port 4 Interface (12pins) 


[capg-o) | 8 
CRDY4 


K 
Y 


|csp(7-0) | 8 | 
|CREQ5 =| 1 | VO | Asynchronous reset. Placed in high-impedance state ; 


C5D(7-0 

CREQ5 3 : in high-i . 

/0 
/0 


Table Concluded on Next Page 
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Table 6-3. Pin Operation at Reset (Concluded) 


rNOF(O-3) | 4 | VO | Asynchronous reset, Placed in high-impedance state. 
Taal ———=«dt~SS*d;(~( Resethasnoeffect. 
TACK | 1 | 1 | Synchronous reset —=—SC~S~S~S 
RESET | 1 | 1 | RESETinputpin ——SOSCSCS~—S 
PRESETLOG.) | 2 | 1 | Resethasnoefech —SSOSCS~C~S~S~S~S~S~S 
TROMEN | 1 | 1 | Resethasnoetfect. ——SC~C~S~S 
Treuko | 1 | VO | Asynchronous reset. Placed in high-impedance state. 


Reset has no effec. 
KCLKIN [1 | 1 [Resethasnoeffec. —SSSOSCS~S~S 


Clock and Power (4 pins) . 


Synchronous reset. Will go to its initial state when RESET 


makes a 1 to 0 transition. 


Synchronous reset. Will go to its initial state when 
makes a 1 to 0 transition. 


RESET 


7 ns) ; | 


Reset has no effect 


10 
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a 
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At system reset, the following additional operations are performed: 

CL} Timer registers (Section 9.10 on page 9-45) are set. 

™ Timer global control register set to zero except that bit DATIN is set 
to the value on pin TCLK. 

m Timer counter and timer period registers set to zeroes. 

Communications port control registers (subsection 8.4.1 on page 8-9) 

set to zeroes. 

External memory interface control registers (Section 7.2 on page 7-6) 

are set to 3E39 FFFOh. 

DMA channel control register, DMA transfer counter, and DMA auxiliary 

transfer counter (subsection 9.3.1 on page 9-7) are set to zeroes. © 

The following CPU registers are loaded with zeroes (each described in 

Chapter 3): 


m ST (CPU status register) 
IIE (CPU internal interrupt enable register) 


oo Oo OO 


IIF (interrupt flag register; controls pins I1OF(3—0)) 
DIE (DMA internal enable register) 


IVTP (interrupt-vector table pointer) 


m@ TVTP (trap-vector table pointer) 
LJ Then the reset vector is read from its location and loaded into the PC. 
This vector contains the start address of the system reset routine. 


[1 Execution begins. Refer to Section 12.1 on page 12-3 for an example 
of a processor initialization routine. 


Multiple TMS320C40s driven by the same system clock may be reset and 
synchronized. When the 1-to-0 transition of RESET occurs, the processor 
is placed on a well-defined internal phase, and all of the TMS320C40s will 
come up on the same internal phase. 
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6.7 Interrupts 


The TMS320C40 supports multiple internal and external interrupts, which 
can be used for a variety of applications. This section discusses the opera- 
tion of these interrupts. Additional information regarding internal interrupts 
can be found in Section 8.4 (page 8-8), Section 8.6 (page 8-17), Table 8—1 
(communication ports on page 8-10), Section 9.9 (DMA on page 9-40), and 
Section 9.10 (timers on page 9-45). 


The four external interrupts (IIOFO—-IIOF3 as shown in Figure 6-6) are en- 
abled at the IlE register (subsection 3.1.9, page 3-10). They are synchro- 
nized internally. They are sampled on the falling edge of H1 and passed 
through a series of H1/H3 delays internally. Once synchronized, the inter- 
rupt input will set the corresponding interrupt flag register (IIF) bit if the inter- 
rupt is active. These are the external interrupts and their corresponding in- 
terrupt vectors (the latter shown in Figure 6-6 on page 6-27): 


HOF Pin & Interrupt 
Interrupt Vector Location 
IOFO IVTP + 003h 
OF 1 IVTP + 004h 
llOF2 IVTP + 005h 
IOF3 | IVTP + 006h 


These interrupts are prioritized in that one is selected over the other if both 
come on the same clock cycle (IIOFO the highest, IIOF1 next, etc.). When 
an interrupt is taken, the status register ST(GIE) bit is reset to 0, disabling 
any other incoming interrupt (except NMI — nonmaskable interrupt). This 
prevents any other interrupt (IIOFO—3) from assuming program control until 
the ST(GIE) bit is set back to 1. The NMI (an incoming low on pin AJ5, signal 
NMI) is not masked by the ST(GIE) bit. On a return from an interrupt routine, 
the RETI and RETIcond instructions place the value that is in the ST(PGIE) 
bit into the ST(GIE) bit, returning it to its value before the context switch. 


Even though the NMI is nonmaskable, it is temporarily masked during de- 
layed branches and multicycle CPU operations. NMI is a negative-going, 
edge-triggered, latched interrupt. 


External interrupts can be effectively either edge- or level-triggered, de- 
pending on how the TYPE fields are set in the lIF register (see Table 3-6 
on page 3-13). An external interrupt must be held low for at least one H1/H3 
cycle to be recognized by the TMS320C40. For level-triggered interrupts, 
if the interrupt is held low for between one and two cycles, then only one in- 
terrupt is recognized. If the interrupt is held low two or more cycles, more 
than one interrupt may be recognized, depending on how rapidly interrupts 
are serviced. | 
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6.7.1 Interrupt Control Bits 


When a particular interrupt is processed by the CPU or DMA controller, the 
corresponding interrupt flag bit is cleared by the internal interrupt acknowl- 
edge signal. It should be noted, however, that for level-triggered interrupts, 
if IOFn is still low when the interrupt acknowledge signal occurs, the inter- 
rupt flag bit will be cleared for only one cycle and then set again because 
IIOFn is still low. Accordingly, it is theoretically possible that, depending on 
when the IIF register (described in subsection 3.1.10 on page 3-12) is read, 
this bit may be zero even though IIOFn is zero. When the TMS320C40 is 
reset, zero is written to the interrupt flag register, thereby clearing all pend- 
ing interrupts. 


The interrupt flag register bits may be read and written to under software 
control. If, at the IIF register, FUNCx = 0 and TYPEx = 1, then external pin 
IIOFx can be written to. Writing a1 to the IIF register FLAGx bit has the same 
effect as an incoming interrupt received on the corresponding pin. In this 
way, all interrupts may be triggered and/or cleared through software. Since 
the interrupt bits also may be read (TYPEx = 0), the interrupt pins may be 
polled in software when an interrupt-driven interface is not required. 


Internal interrupts operate in asimilar manner. In the IIF register, the bit cor- 
responding to an internal interrupt (e.g., TINTO, TINT1) may be read and 
written to through software. Writing a 1 sets the interrupt latch, and writing 
a 0 clears it. All internal interrupts are one H1/H3 cycle in length. 


The CPU global interrupt enable bit (GIE), located inthe CPU status register 
(ST), controls all CPU interrupts. All DMA interrupts are controlled by the 
DMA enable register bits and the SYNC bits of the DMA channel control reg- 
isters (described in Figure 9-2 and Table 9-1 on page 9-8). The DMA in- 
terrupts are not dependent upon ST(GIE) and are local to the DMA. 


To provide for maximum performance in servicing interrupts, the interrupt 
acknowledge (IACK) instruction is provided. IACK drives the IACK pin and 
performs adummy read. The read is performed from the address specified 
by the IACK instruction operand. When IACK is used, it typically is placed 
in the early portion of an interrupt service routine. For certain applications, 
it may be better suited at the end of the interrupt service routine or be totally 
unnecessary. 


6.7.2 Prioritization and Control 
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The prioritization of interrupts is handled by the CPU according to the inter- 
rupt vector table shown in Figure 6-6. Prioritization is according to position 


in the table — those with displacements closest to the IVTP base address 


are higher in priority (i.e., NMI is higher than TINTO, which is higher than 
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I1OFO, etc.). Note that interrupt TINTO is located at IVTP + 2 while the TINT1 
vector is after the communication port and DMA coprocessor interrupts at 
IVTP + 2Bh. 


Prioritization means an interrupt in a higher position in the interrupt vector 
table (Figure 6—6) will be accepted over one in a lower position when both 
are received in the same clock cycle. \t does not mean, for example, that 
NOF3 must wait until service routines for IOF2, IIOF1, and IIOFO are com- 
pleted (when ST(GIE) = 1). 


If the DMA coprocessor is not using interrupts for synchronization of trans- 
fers, it will not be affected by the processing of the CPU interrupts. If the CPU 
is involved in a pipeline conflict (branch, register, or memory), it will not re- 
spond to the interrupts until that conflict is resolved. It ts therefore possible 
to interrupt the CPU and DMA coprocessor simultaneously with the same 
or different interrupts and, in effect, synchronize their activities. For exam- 
ple, it may be necessary to cause a high-priority DMA coprocessor transfer 
that avoids but conflicts with the CPU, i.e., makes the DMA coprocessor a 
higher priority than the CPU. This may be accomplished by using an inter- 
rupt that causes the CPU to trap to an interrupt routine that contains an IDLE 
instruction. Then, if the same interrupt is used to synchronize DMA 
coprocessor transfers, the DMA coprocessor transfer counter can be used 
to generate an interrupt and, thus, return control to the CPU following the 
DMA coprocessor transfer. 


Since the DMA coprocessor and CPU share the same set of interrupt flags, 
the DMA coprocessor may clear an interrupt flag before the CPU can re- 
spond to it. For example, if the CPU interrupts are disabled, the DMA 
coprocessor can respond to interrupts and thus clear the associated inter- 
rupt flags. 


Note the following situations: 


Li If there is a delayed branch in the pipeline, interrupts are held pending 
until after the branch. 

[J If the interrupt occurs in the first cycle of the fetch of an instruction, the 
fetched instruction is discarded (not executed), and the address of that 
instruction is pushed to the top of the system stack. 

Ll If the interrupt occurs after first cycle of the fetch (in the case of a multi- 
cycle fetch due to wait states), that instruction is executed, and the ad- 
dress of the next instruction to be fetched is pushed to the top of the sys- 
tem stack. | 

[1 If no program fetch is occurring, then no new fetch is performed. 


After the address of the appropriate instruction has been pushed, the inter- 
rupt vector is fetched and loaded into the PC, and executed continues. 
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Figure 6-5. 
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Note 5 


Note 5 


Note 6 


Note 3 
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Reserved for the reset vector when IVTP = 0000 0000h and RESETLOC(1,0) = 0 05 or 
when IVTP=08000 0000h and RESETLOC(1,0) = 1 05. See Table 3-8. 

NMI (non—maskable interrupt) is discussed in Section 9.9, page 9-40. 

Timer interrupts TINTO and TINT1 are enabled and programmed by the IIE register (subection 
3.1.9, page 3-10) and monitored at the IIF register (subection 3.1.10, page 3-12). 

External pins IIOFO—IIOF5 are programmed in the DIE register (subsection 3.1.8, page 3-8) 
and the IIF register (subection 3.1.10, page 3-12). 
The communication port |/O buffers full/ready interrupts are enabled by the DIE and IIE re— 
gisters and also discussed in Table 8-1, page 8-10 (OUTPUT LEVEL & INPUT LEVEL bits). 
DMA interrupts are enabled at the IIE register and DMA channel control register (at bits TCC 
and AUX TCC explained in Table 9-1 on page 9-8). 
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Figure 6-6. 
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The TMS320C40 allows the CPU and DMA coprocessor to respond to and 
process interrupts in parallel. Figure 6—6 shows interrupt processing flow. 
The interrupts are polled, and the CPU and DMA coprocessor begin pro- 
cessing them. In the interrupt flow pertaining to the CPU (left side of figure), 
the interrupt flag corresponding to the highest priority enabled interrupt is 
cleared, and GIE is setto 0. The CPU completes all fetched instructions. The 
interrupt vector is fetched and loaded into the PC, and the CPU continues 
execution. The DMA coprocessor cycle (right side of figure) is similar to that 
for the CPU. After the pertinent interrupt flag is cleared, the DMA coproces- 
sor proceeds according to the status of the SYNCH bits in the DMA 
coprocessor global control register. 


Interrupt Processing 


¥ Yes 
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The TMS320C40 has two identical 80-pin parallel external interfaces: the 
global memory interface and the local memory interface. Each interface 
has the following features: 


{1 separate 80-pin configurations, each with its own 32-bit data bus and 
31-bit address bus, 


[1 single-cycle reads and pipelined writes, 
[1 independent enable signals for data, address, and control lines, 


[J bus-request and bus-lock signaling for share memory parallel 
processing, 


LJ user-controlled mapping of addresses to either of two sets of nine: , 
dent strobes for different soeed memories, 


[1 look-ahead bus status signals for defining current and requested bus 
operations for parallel processing arbitration, 


[J selectable wait states (both software- and hardware-controlled), 


[) signals that indicate when memory page boundaries are crossed. This- 
supports 
M page-mode and static-column:decode DRAMs, 
™ high-speed SRAM banks, and 
m slower-speed memory banks and I/O devices. 


Note: Description Covers Both Interfaces in this Chapter 


This chapter covers both the global memory interface and the local memory 
interface. However, only the global memory interface is shown throughout 
this chapter because it is identical in every way to the local memory inter- 
face except that (1) they have different positions in the memory map, and 
(2) the control signals for the local memory interface have an additional 
“L” prefix (as described in Figure 7—1 on page 7-3). 
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External Bus Operation 


7.1 Global (and Local) Memory Interface Control Signals 


As explained in the Note on page 7—1, this text covers the global memory 
interface control signals; it also applies to the local memory interface control 
Signals (with the exceptions stated in the note). 


Figure 7-1. Global and Local Memory Interface Control Signals 


NOTE: The signals used in this figure are 
for the global memory interface. Howev- 
er, local memory interface signals have 
the same configuration except that an 
additional “L” (for local) prefix is added for 
each signal (e.g. R/(W0 becomes LR/WO, 
and STRBO becomes LSTRBO, etc.). 


A 0) 


As shown in Figure 7—1, the global memory interface has two sets of control 
signals, STRBO and STRB1. The global memory port control registers 
(Section 7.2 on page 7-6) define which set of registers is active. 
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Table 7-1. Global Memory Interface Control Signals | 


<= 


R/W(0,1 O/Z | Specifies memory read (active high) or write (active low) mode. 


—_—~ ) 
STRB(0,1) Interface access strobe. | 
PAGE(0,1) Memory-page enable signal for STRB(0,1) accesses. — 
RDY(0,1) a Indicates external memory is ready to be accessed. _ 


Control signal enable for R/Wx, STRBx, and PAGEx_ signals. When 
high (a one), it places the corresponding R/Wx, STRBx, and PAGEx 
signals in high-impedance state (x=0 for CEO and x=1 for CE1). 


‘DE =—s fs When high (a one), places data lines D31 — Oin high-impedance state. 
AE When high (a one), places address lines A30 — 0 in high-impedance 
STAT(3 — 04 cy 


state. 
Four lines to define status or function of the memory port as shown in 
Table 7-2 (next page). | | 
Indicates if an interlocked access is underway (0 = access underway; 
LOCK# 1 = access not underway). LOCK is changed only by the interlocked in- 
structions. | | 
T This table applies to both the global memory interface and local memory interface (local memory 
interface signals have an additional “L” prefix). The numbers in parentheses mean that either a 
0 (zero) or a 1 can follow the prefix shown to the left of the parentheses. A zero indicates STRBO 
control signals (shown in Figure 7—1), and a one indicates STRB1 control signals. 
§ O = output; | = input; Z = high impedance (three-stated). 
+ Ses — 0)and LOCK cannot be placed in the high-impedance state by an external control sig- 
nal. 


CE(0,1) 


Table 7—2 on the next page shows how pins STAT3 to STATO define the cur- 
rent status of the global memory port. For many bus accesses, these signals 
provide information about the access that is about to begin. The code for a 
SIGI instruction read is useful for distinguishing between a SIGI read and 
a LDII or LDFI read. 


The bus idle status code is 11115 (bottom of Table 7—2), which simplifies 
modular shared-bus multiprocessor interfaces, because pull-up resistors 
can be used to signal the idle condition when processor cards are not at- 
tached to the shared bus. 
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This table applies to both the global memory interface and local memory 
interface (for local memory interface signals, add an additional “L” prefix 
such as LSTAT3, LSTAT2, etc.). 
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Local) Memory Interface Control Signals 


Status 


_STRBO access, program read 


‘data read 
‘DMA read 


“STRBO access, SIGI (instruction) read 
"Reserved » 7 
[ STRBO access, data write 
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7.2 Memory Interface Control Registers 
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As explained in the Note on page 7—1, this text covers the global memory 
interface control signals; it also applies to the local memory interface control 
signals (with the exceptions stated in the note). 


Figure 7-2 shows the memory map for both the global and local memory 
interface control registers. Each register can be programmed to control its 
respective memory interface by defining: 


[1 page sizes for the two strobes, 

[1 when strobes are active, 

[1 wait states, 

[1 other operations that control the memory interface. 


Table 7-3 (on page 7-8) describes the fields in these registers. 


At reset, the binary values shown above each bit in Figure 7—1 are written 

to the global memory interface control register. Values i in bits 3 - 0 are the 

values at these bits’ respective pins (AE, DE, CE1, and CEO). This reset 
condition has the following effects (for the local and global bus): 

Ci STRBO and STRB1 (LSTRBO and LSTRB1) page sizes are set to 
001115 (256 words). 

[1 STRBO and STRB1 (LSTRBO and LSTRB1) wait states are set to 7 
cycles. 

[1 STRBOandSTRB1 (LSTRBO and LSTRB1) accesses require an exter- 
nal ready signal and an internal ready signal generated by the software 
wait-state generator. 

[1 STRBO (LSTRBO) is active for all addresses over the global (local) 
memory interface. 

[ Back-to-back reads that switch from STRBO to STRB1 (or STRB1 to 
STRBO) result in the insertion of a single cycle between these reads. 


As shown in Figure 7-2, fields STRB1 SWW and STRBO WW are both set 
to 119 to allow the internal ready signal to be generated by RDYwtent (on- 
chip wait-state counter) and external RDY. 
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Figure 7-2. Format for the Memory-Interface Control Registers 


00010 0003h |__ 
00010 0004h |. Fore : 


0 0 
_STRB 
33 30 Se ee 


16 15 «+14 


RW RW RW RW RW RW RW RW RW RW RW RW RW 


RW RW RW RW RW RW RW RW RW RW R R R 


NOTES: 1. Theregister cell figure (immediately above) contains global memory interface control 
register mnemonics. However, local memory interface control register mnemonics 
can be visualized by adding an “L” prefix to each mnemonic in the figure (e.g., LSTRB 
SWW, LCEO, etc.). 

2. The is and Os above each bit are the binary values written to the register at reset. 
The values at bits 3 — 0 are defined by the values of their respective external pins (AE, 
DE, CE1, and CEQ). 

3. These registers are shown in the overall memory map in Figure 3—9 and Figure 3—10 
on pages 3-19 and 3-20, respectively. 

4. RW = read/write; R = read. 
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Table 7-3. __ Bit Definitions for Both Memory Interface Control Registers 


CEO 


Value of external pin CEO (after it passes through an inter- 
nal synchronizer). The value is not latched. 


1 CE Value of external pin CE (after it passes through an inter- 
nal synchronizer). The value is not latched. 


Value of ee pin AE (after it passes through an internal 
synchronizer). The value is not latched. 


Software wait states for STRBO access. In conjunction 
with STRBO WTCNT, this field defines the mode of wait-state 
generation. Actual wait states are explained in Section 7.4 

and in Table 7—7 on page 7-16. 


Software wait states for STRB1 access. In conjunction 
with STRB1 WTCNT, this field defines the mode of wait-state 
generation. Actual wait states are explained in Section 7.4 

and in Table 7—7 on page 7-16. 


6-7 | STRB1 SWW — 
8-10 | STRBO WTCNT | 
| 11-13 | STRB1 WTCNT 


STRBO PAGESIZE 


Software wait-state count for STRBO accesses. Specifies 
the number of cycles to use when software wait states are 
active. Three-bit range is from 0005 (zero) to 1115 (seven). 


Software wait-state count for STRB1 accesses. Specifies 
the number of cycles to use when software wait states are 
active. Three-bit range is from 0005 (zero) to 1115 (seven). 


Page size for STRBO accesses. Specifies number of MSBs 
of the address to use to define the bank size for STRBO ac- 
cesses. See range table in Table 7—4 on page 7-9. 


Page size for STRB1 accesses. Specifies number of MSBs 
of the address to use to define the bank size for STRB1 ac- 
cesses. See range table in Table 7—4 on page 7-9. 


STRB1 PAGESIZE 


Specifies address ranges over which STRBOt and STRB1t 
are active. See ranges in Table 7-5 on page 7-10 for STRB 
ACTIVE and Table 7-6 on page 7-11 for LSTRB ACTIVE. 


Inserts a single cycle between back-to-back reads that 
switch from STRBO to STRB1 (or vice versa). 


24-28] STRB ACTIVE 
oy Une Se When a 1, insert cycle. 
When a 0, don'tinsert cycle. 
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T Mnemonics used are for the global memory interface control register. For the local memory interface control 
register, add the prefix “L” to each mnemonic (e.g., LCEO, LCE1, LSTRB1, etc.). The description remains 
the same for the local memory interface control register. 
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Table 7-4. Page Size as Defined by STRBO/1 PAGESIZE Bits t 


STRBx External Address Bus External Address Bus 


PAGESIZE Bits Defining the Bits Defining Page Size 
Field+ Current Page Address on a Page (32-Bit Wds) 


Reserved Reserved 


70 
10—0 


00110 Reserved 
20-8 
206 
30— 10 
3011 
30— 12 
30 — 13 
30— 14 
___30—15 
30— 16 
3017 
30-18 
3019 
3020 
3021 
_30— 22 
3023 
30— 24 
30 — 25 
30 — 26 
30— 27 
30— 28 
30 — 26 
30 
None 
Reserved Reserved 


Mnemonics used are for the global memory interface control register. For the local memory interface control 
register, add the prefix “L” to each mnemonic (e.g., LSTRBO PAGESIZE, LSTRB1 PAGESIZE, etc.). The de- 
scription remains the same for the local memory interface control register. 

The “x” in STRBx means that the data in the columns are for STRBO or STRB1 as well as for LSTRBO and 
LSTRB1, as explained in the note above. 

An STRBx PAGESIZE field of 101109 is depicted in Figure 7-4 on page 7-13. 


10 
120 
1s—0 
4c 
15—0 
a 
70 
180 
9-0 


10 
22 —0 
23 —0 
24—0 
25—0 
260 
a7 0 
280 
200 


30 — 0 231 = 2G 


a 


Oro ++ 


7- 


c 


Memory Interface Control Registers 


SON SERO Room me eet teat en tie ettnrg een nae tare aaa ace ieee ee a ee a mre me ene car a meee ge ateaeN enon sateen tenn Rane enna Benen 


Table 7-5. Address Ranges Specified by STRB ACTIVE Bits T 


STRB ACTIVE STRBO Active STHEO Active STRB1 Active 
Field Address Range sonatas Address Range 
Range Size | 
= 


01111 8000 0000 — 8000 FFFF | 216=64K | 8001 0000 — FFFF FFFF 


[600% 0000 — FFE FFF 
2080 0000_— FFFF FFF 
[6100 0000_— FFFFFFFF_ 
[8200 0000 — FEF FFF 
2400 0000 — FFFFFFFF_ 
6800 0000 — FFFFFFFF_ 
[8000 0000 — FFFFFFFF_ 


10110 [000 0000 — so7F FEFF | 22 = am | 6080 0000 — FFF FFFF 
$100 0000 — FFFF FFFF 

$000 0000 — FFF FFFF 
| 11001 [2000 coco — srr FEFF | 22% = 6am | 6400 0000 — FFF FFFF 


8000 0000 — FEFF FFF 
[11100 | 8000 0000— 9FFF FFFF | 22°- stam _| A000 0000— FFFF FFF 
8000 0000—BFFFFFFF | 230=1G — | C000 0000 — FFFF FFFF 
g0000000 —FFFFFFFF| 231=2G | = None | 
iii | Reserved | Reserved | Reserved 


Tt Address ranges specified by the LSTRB ACTIVE bits are listed in Table 7-6. 


11010 8000 0000 — 87FF FFFF 227 ~ 128M 8800 0000 — FFFF FFFF 
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Table 7-6. Address Ranges Specified by LSTRB ACTIVE Bits? 


LSTRB AC- LSTRBO Active eh eee — LSTRB1 Active 
TIVE Field Address Range Range Size Address Range 
= 


[10001 [000 0000 — ooaaFFFF | —278= 256K | 0004 0000 — 7EFFFFFF_ 
[10100 [0000 0000 —ootFFFFF [221 =2M | 00200000 — 7FFFFFFF_ 
-~1010 | 0000 0000 — oo7FFFFF | 2°5= em | 0080 0000 — 7FFFRFEF 
[11000 [0000 0000 — or FFFFFF | 2®%= saw | 0200 0000 — 7FFFFFEF 
[11001 [0000 0000 —ooFFFEF | — 22. cem | 0400.0000_— 7FFFFFFF 
"11011 [0000 0000 — oF FFRFFF | 2°F=2sem | 1000 0000 — 7FRFFFFF 
00000000 —7FFFFFFF | 2'=2G [None 


T Address ranges specified by the STRB ACTIVE bits are listed in Table 7—5. 
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7.3 Use of the Global Memory Interface Registers 
7.3.1 Mapping Addresses to Strobes 


Figure 7-3 demonstrates the relationship between the STRB ACTIVE bits 
(defined in Table 7-3, page 7-8) and the address ranges over which sig- 
nals STRBO and STRB1 are active. Note that the address ranges of STRBx 
and LSTRBx also govern the ranges of their associated signals RDYx, 
LRDYx, R/Wx, LR/Wx, PAGEx, LPAGEx, etc. (where x = 1 or 0). 


Figure 7-3. Effects of STRB ACTIVE on Global Memory Bus Memory Map 
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Example (a) of Figure 7-3 howe the reset ‘condition (STRB ACTIVE = 
111109). In this case, signal STRBO is active over the entire address range 
of the global memory bus (see Table 7—4 for lookup table of STRBACTIVE). 


7-12 | _ External Bus Operation 


ae 
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MPTP PLES E MO  POO POCO IEICE TEM 


PPO O OCP CPE OPO OEP OOO 


: ons sssonnsnoiatnsninininniennnnnitt 

/ 8000 0000h[ 8000 0000h [7 | F 
: STRBO 
i active 
| 803F FFFFh 
8040 0000h a 
STRBO oo oa 

¢ active 26 “AM)." STRBi 
7 Words « in active 
FFFF FFFFh . a FFFFFFFFR Cen es 
(a) STRB ACTIVE = 111105 — (b) STRB ACTIVE = 10101 2 
NOTE: Shown here are two examples for the global memory map. The entire 'C40 
memory map (local and global) is shown in Figure 3-9 on page 3-19. Note that 

| the highest address for LSTRB1 (local bus) is 7FFF FFFFh. 


: 
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Example (b) of Figure 7-3 shows the global memory bus memory map 
when STRB ACTIVE = 101015. In this case, STRBO is active from address- 
es 8000 0000h — 803F FFFFh, and STRB1 is active from addresses 
80400000h — FFFF FFFFh (as shown in Table 7—4 for an STRB ACTIVE of 
101015). 


7.3.2 Page Size Operation 
Figure 7-4. | STRBx PAGESIZE Fields Example 


: External Address External Address 
: Bus Bits Defining Bus Bits Defining 
i j<— the Current Page—>}*——— Address on a Page —_—_—_—»| 


PPP ESOP ML PLISEP EPL OI SILOS POEM PP EDS OTELEE LEIP 


1a 99! 29 | | 0! 


NOTE: This figure represents an STRBx PAGESIZE field value of 101109 (as 
shown in Table 7—4 on page 7-9). 
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The TMS320C40 external interface allows you to specify (using a 31-bit ad- 
dress) independent page sizes for the different sets of external strobes. This 

capability, shown in the example in Figure 7—4, gives you a great deal of 

flexibility in the design of external high-speed, high-density memory sys- 

tems and the use of slower external peripheral devices. 


The STRBO PAGESIZE and STRB1 PAGESIZE fields in the memory inter- 
face control register (shown in Figure 7-2 on page 7-7) workin the same 
manner to specify the page size for the corresponding strobe. Table 7—4 
(page 7-9) illustrates the relationship between the PAGESIZE field and the 
bits of the address used to define the current page and the resulting page 
size. Page size begins at 256 words (with external address-bus bits 7 — 0 
defining the address on a page, and ranges up to 2G words with external 
address bus bits 30-0 defining the location on a page. The example in 
Figure 7—4 shows how a pagesize field value of 101105 is translated into bits 
30 — 23 defining the current page and bits 22 — Odefining address onapage. 
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Changing from one page to another causes a cycle to be inserted in the ex-. 
ternal access sequence in order for external logic to reconfigure itself appro- 
priately. The memory interface control logic keeps track of the address used 
for the last access foreach STRB. When an access begins, the PAGE signal 
corresponding to the active STRB goes inactive (high) if the access is to a 
new page. The PAGEO and PAGE1 signals are independent of one another, 
each having its own page-size logic. 


At reset, the page-control logic is initialized so that the extra cycle is inserted 
for the first access to the two strobe interfaces. 


The local memory interface has a similar set of control registers. 


7-14 External Bus Operation 


_Programmable Wait States 


sweeney aelecsaeaseseteaseecaeresanisiebe ates aanarenieeseteaeaerarascanaeeedassteessesnrcacesscnaranataeeeatasananatateteesecasanatatesatateteecettasatasetetetalsetetateteteseetatetatetete tone tosseusrato PGF OSVGMSGSTESeCPatereAGeUtseeSaeatett cssenbaeeeticaesaahatanceentatatansececeenenevanieatiperetatesesenaretatetetatatetensnatatenateteteeatateetitatetasetetetetatetetetetetd,ateretiieteterctanaretetatattetetatetsteters Radetetedateroteteteteteterateteteterstatenenersreneseseldentoratecetels 


7.4 Programmable Wait States 


Control wait-state generation by manipulating memory-mapped control reg- 
isters associated with both the global and local interfaces. Use the STRBx 
WTCNT field to load an internal timer, and use the STRBx SWW field to se- 
lect one of the following four modes of wait-state generation: 


O External RDY 

[1 WTCNT-generated RDYwient 

[ Logical-AND of RDY and RDYwrent 
1 Logical-OR of RDY and RDYwrent 


Application of wait states and ready a are covered in Section 13.4 on page 
13-27. 


The four modes are used to generate the internal ready signal, RDY int, that 
controls accesses. As long as RDY ‘nt = 1, the current external access is ex- 
tended. When RDY;,; = 0, the current access completes. Since the use 
of programmable wait states for both external interfaces is identical, only the 
global bus interface is described in the following paragraphs. | 


RDYwtent | is an internally generated ready signal. When an external access 

is begun, the value in WTCNT is loaded into a counter. WICNT may be any 

value from 0 through 7. The counter is decremented every H1/H3 clock 
cycle until it becomes 0. Once the counter is set to 0, it remains set to 0 until 

the next access. While the counter is nonzero, RDYwient = 1. While the 

counter is 0, RDYwtent = 0. 


Table 7-7 is the truth table for each value of SWW, showing the different val- 
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Table 7-7.  Wait-State Generation for Each Value of SWW 


Ca ninieeaenmemed 


RDYjn¢ is dependent only upon RDY. 
RDY went IS ignored. 


RDY int is dependent only upon 
DY wtent- RDY is ignored. 


RDYjn¢ is the logical-OR (electrical- 
AND, since these signals are low 
true) of RDY and RDYwient. 


RDYint is the logical-AND (electrical- 
OR, since these signals are low true) 
of RDY and RDY wtent: 


0 
1 
0 
1 
0 
1 
0 
1 
0 
1 
0 
1 
0 
1 
0 
= 
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7.5 Timing 
Figure 7-5.  STRB and RDY iak 


| | 
| | 
| 
| || 
STRB \ iV 
) | 


‘Note: ‘Dotted lines ‘emphasize the relationships. between. signals that is fate 
explained in the accompaning text below. 


Throughout this chapter, no distinction is made between global and local 
interface signals and between STRBO and STRB1, except for clarity. 


As shown in Figure 7-5, STRB changes on the falling edge of H1, and RDY 
is sampled on the falling edge of H1. Throughout the other timing diagrams 7 
in this section, the following general rules apply to the logical timing of the 

parallel external interfaces: 


1) Changes of R/W are always framed by STRB. 

2) A page boundary crossing for a particular STRB results in the corre- 

sponding PAGE signal going high for one cycle. 

R/W transitions are always on an H1 rising. 

STRB transitions are always on an H1 falling. 

RDY is always sampled on an H1 falling. 

On a read, data is always sampled on an H1 falling. 

On awrite, data is always driven out on H1 falling. 

On a write, data is always stopped from being driven on H1 rising. 

Following a read, the status, and PAGE signal change on H1 falling. The 

address changes on H1’s falling edge. 

10) Following a write, status and PAGE signals change on H1 falling; the 
address changes on H1 rising. 


O OND OB W 
eee Ne eee ee eee eee” 
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11) The fetch of an interrupt vector over an external interface is identified by 
the status signals for that interface (STAT or LSTAT) as a data read. 

12) The interlocked operation status signals (LOCK and LLOCk) have the 
same timing as the STAT and LSTAT status signals, respectively. 

13) Any time PAGE goes high, STRB goes high. 


Figure 7-6 illustrates a read, read, write sequence. This figures assumes 
that all three accesses are to the same page and that they are STRB1 ac- 
cesses. This timing diagram illustrates that back-to-back reads to the same 
page are single-cycle accesses. When the transition from a read to a write 
is done, STRB goes high for one cycle in order to frame the R/W signal 
changing. | 


Figure 7-6. | Read Same Page, Read Same Page, Write Same Page Sequence 


PAGE1 ! ! bo ' 
pa1-Do {KL 


STAT3— a (STRB1 read) X (STRB1 read) (STRB1 write) 
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Figure 7—7 shows that STRB goes high between back-to-back writes. As in 
Figure 7-6, STRB goes high between a write and a read, and it frames the 
R/W transition. 


Figure 7-7. Write Same Page, Write Same Page, Read Same Page Sequence 


STAT3 z STAT '  (STRB1 write) (STRB1 write) (STRB1 read) , 


en 


Note: Strobe and Ready Further Defined 


Strobe and ready are discussed from the application viewpoint in Sections 
13.3 (page 13-20) and 13.4 (page 13-27) respectively. 


t A Reread 
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Figure 7-8 shows that going from one page to another on back-to-back 
reads causes an extra cycle to be inserted, and the transition is signaled by 
PAGE going high for one cycle. Also, STRB1 goes high for one cycle. 


Figure 7-8. | Read Same Page, Read Different Page, Read Same Page Sequence 


; 1 ' ' ' ' . 1 
R/WOo \ i ' ‘ ‘ 1 


STRBO | ; | 7 ) | | | 
i} i) : . (] ' t i] 

sr a i er a ns a a 
PAGEO | 7 7 | | ; | 7 


RW1 | : , | 3 : : : 
mf 
RDY1 \ } / ) | \ . / ) \ : / 3 
PAGE1 : | / ! ! \ : | | : 


STATS — se (STRB1 read) (STRB1 read) (STRB1 read) | 
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(STRB1 write) 
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(STRB write) 


Figure 7—9 shows that on back-to-back writes, when a page switch oc- 


curs, it is signaled with PAGE going high for one cycle. 


Write Same Page, Write Different Page, Write Same Page Sequence 


Figure 7-9. 
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e, Write Different Page Sequence 
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Figure 7-10. 
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Figure 7-11. Read Different Page, Read Different Page, Write Same Page Sequence 


STATS ~ STA (STRB1 read) _ (STRB1 read) (STRB1 write) 


7-23 


Ti a 
Heseeeaeeeseseseceteerscacctsentccetetatatetetatatetetatetetetatenatetatoncretetetenctatstetetareseneversnetetesssesaratscetenctevetonstatstetetesscoretoretoneteteresstetareteteresstetesstorerereterenes 


eesaeecmeaseeaueraneteeasenatteNstenstanesetatataeetaestanstetetetstetaetattanstenatatatatatatattetstetatetssatatstetatetasatatstatstetaetatetatesatatetetatetatetetsfotatetatetatstenstetatetatctotatatatetatetatetatetetstatstetetatstateetetatetetatetatetetetasatatatetatetercfetstotecetetetateteestesesetatstenecetecetatetatstatatetalesatetetetetate 


Figure 7-12. Write Different Page, Write Different Page, Read Same Page Sequence 
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STAT3 — se (STRB1 write) (STRB1 write) (STRB1 read) 


7-24 External Bus Operation 


roresecerererelorocorefetsleretetetereceretetereleleererelere’elece‘ecerevereteletelerare’e‘eretere’e“evecevets etete"e7e"sre'e"ene°nre"s'e'p10.0.0.¢.0.079,070'e7erelererere"oreve'sce'everarere"ece'e7e;overeve'erecore’erererererevere’e'e's'erers/e'e'e'e'e'e'0;4!orerera’e'e'rorete erereverpse 6° ei 


| | | | | | | | | | 

RWO 7 | | | | | | | | | 

| | | | | | | | 
(oe 

STRBO | | | | | | | | ! | 

| | | | | | | | | | 


| | | | | | | | | | 
PAGEO | | | | | | | | | | 
| | | 
= | l | | 
RW | | | | | | | | | 
| | | | | | | 
piece | | | | | | 
STRB1 | | | | | | | 
| | | | | | fo, | | ! 
RDY1 | | | 
| | | 
| | | | | | 
PAGE1 | | | | | | | 
l | | ; | | | 
| | | | 
D31—-— DO 
| l | | Nl | | 


| | | | 


STATS — sas (STRB1 read) (STRB1 write) (STRB1 read) 


7-25 | 


etetot 


setoeedinepegsebgsosebcseeedesessssseneseteneseacoessacesseseaesssessseseceassesstrcessseonssassestasennessocaresanenctaronaeanecatetatanetonatatesetvtetatetetatstatatstenatetetesetesetstotMstetatatotatstatatatetatattatats erat etatstatetate'starstetelanstensnatotatatstahsPereraratataretara sOutatabatatatetats'otatscetstetansetatatstotaPstatatetatatessnatstsratetatatetara*atensiatetatatetens'enatanseOsPshotahsestataratetaPaesieaPersOsMe"GMNMGNsMststatatehatetetstetetststetst 


Figure 7—14to Figure 7-18 illustrate the idle bus cycles. Idle bus cycle tim- 
ing is similar to read cycle timing. The primary differences are that no data 
is read, STRB is held high, and RDY is ignored. 


Figure 7-14. Read Same Page, Idle One Cycle, Read Same Page Sequence 
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Figure 7-15. | Write Same Page, Idle One Cycle, Write Different Page Sequence 
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Figure 7-16. 


Idle, Read Different Page, Idle Sequence 
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Idle, Write Same Page, Idle Sequence 


Figure 7-17. 
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Figure 7-18. Write Different or Same Page, Idle, Idle Sequence 
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D31 — DO 


A30 — AO 


(STRB1 write) 
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Timing 
Figure 7—19 illustrates an STRB1 read followed by an STRBO read when 
STRB SWITCH = 0. This mode allows the reads to be back to back, with no 


cycles inserted between the reads when the back-to-back reads are activat- 
ing different strobes. 


Figure 7-19. Read Same Page on STRB1, Read Same Page on STRBO, Read Same Page on STRB1 
Sequence When STRB SWITCH = 0 
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Timing ee ee 
Figure 7—20 illustrates an STRB1 read followed by an STRBO read when 
STRB SWITCH = 1. In this mode, a cycle is inserted between back-to-back 
reads that activate different strobes. If your system memory configuration 
is such that bus conflicts can occur during back-to-back reads on different 
strobes, this mode provides one cycle between these strobe transitions to 
avoid the bus conflicts. 


Figure 7-20. Read Same Page on STRB1, Read Same Page on STRBO, Read 
Same Page on STRB1 Sequence When STRB SWITCH = 1 
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Figure 7—21 is similar to Figure 7-19 except that the second read using 
STRB1 is to a different page than the first read (using STRB1). 


Figure 7-21. Read Same Page on STRB1, Read Same Page on STRBO, Read Different Page on 
STRB1 Sequence When STRB SWITCH = 0 
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Timing 
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gure 7—20 except that the second read usin 
RB1 is to a different page than the first read (using STRB1). 


Figure 7—22 is similar to Fi 


ST 
Figure 7-22. Read Same Page on STRB1 


, Read Same Page on STRBO, Read Different Page on 


STRB1 Sequence When STRB SWITCH = 7 
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Figure 7-23. Write Same Page on STRB1, Write Same Page on STRBO, Read Same Page on STAB! 
| Sequence 
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Read With One Wait State 


Figure 7-24. 
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Write With One Wait State 


Figure 7-25. 
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7.6 Using Enabled Signals to Control Signal Group 
Figure 7-26. Using Enabled Signals to Put Signal Groups in a High-lmpedance State 
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Figure 7-26 shows the use of an enable signal to control the corresponding 
signal group. For example, signal DE controls the global external-interface 
signals D31—D0. The enable signals are unsynchronized inputs that turn off 
the corresponding output buffers. Some time period (shown by period (1) 
in Figure 7-26) after the enable signal goes high, the corresponding signal 
group goes into a high-impedance state. Then, some time period after the 
enable signal goes low (period (2) in Figure 7-26), the signal group comes 
out of a high-impedance state. Of course, if the signal group is already in 
a high-impedance state before the enable signal goes high, the group will 

- come out of the high-impedance state (when the enable signal goes low ) 
only if the signal group is in a state requiring it to do so. For example, adata 
bus that was not being driven will be driven after being enabled if an access 
is pending for the data bus. 


If you intend to use internally generated wait states, be certain that 
nothing inappropriate occurs when a bus is disabled. This is because 
itis possible to have a bus in a high-impedance state and with internally gen- 
erated wait states. Inthis case, data that is written will not be seen external- 
ly, and data that is read will be whatever value is sampled on the high-impe- 
dance bus. 
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7 Interlocked-Instructions Definition and Bus Timing 


The LOCK and LLOCK bus-lock signals are manipulated by the interlocked 
instructions LDI!, LDFI, ,STII, STFI, and SIGI. As noted, the timing of the 
LOCK and LLOCK pins is the same as pins STAT(3 — 0)andLSTAT(3 — 0). 
Instructions LDII, LDFI, ,STII, STFl, and SIGI manipulate the bus-lock sig- 
nals only when an external memory access is made. 


Except for the manipulation of the bus-locked signals, the LDII (Load integer 

Interlocked) and LDFI (Load Floating Point Interlocked) instructions are like 

(in all ways) the comparable LDI (Load Integer) and LDF (Load Floating 

Point) in terms of the operation performed and the bus operation. LDII and 

LDFI perform as follows: 

1) Theread cycle is begun, and the appropriate bus-lock signal is placed in 
the active-low state. 

2) The read cycle is extended until the appropriate ready signal is active. 

3) Throughout the read cycle and to its conclusion, the bus-lock signal is 
kept in an active-low state until modified by a subsequent STII, STFI, or 
SIGI instruction. | | 


Figure 7—27 is an example of an LDI! or LDFI external access. 
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Figure 7-27. LDIl or LDFI External Access 
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Interlocked Instructions 


Except for manipulation of the bus-locked signals, the STII (Store Integer 
Interlocked) and STFI (Store Floating Point, Interlocked) instructions are 
the same as the comparable STI (Store Integer) and STF(Store Floating 
Point) in terms of execution and bus operation. STI! and STFI operate as 
follows: 

1) The store cycle is begun, and the appropriate bus-lock signal is kept in 
its current state. In most cases, the interlocked store is preceded by an 
interlocked load, and the bus-lock signal is keptlow. Otherwise, the bus- 
lock signal is high, and the interlocked store looks like a not-interlocked 
store. 

The store cycle is extended until the appropriate ready signal is active. 
When the corresponding STRB goes high at the end of the store cycle 
(the corresponding STAT(0-3) also changes at this time), the corre- 
sponding bus-lock signal also goes high. 


NO 
eee eee” 


An STllorSTFI instruction to internal memory has no effect on the bus-lock 
signals. 


Figure 7—28 is an example of an STIl or STFI external access following the 
previous interlocked load (shown in Figure 7—27) and an idle cycle. This is 
the timing for an interlocked load/interlocked store sequence. 
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Figure 7-28. STi or STFI External Access 
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The SIGI instruction (signal interlocked) is similar to the LDII and LDIF in- 

structions. The SIGI functions as follows: 

1) Theread cycle is begun, and the appropriate bus-lock signal is placed in 
the active-low state. 

2) The read cycle is extended until the appropriate ready signal is active. 

3) When the read operation is complete, the bus-lock signal is brought 
high with the same timing as the status signals changing. 


Figure 7—29 is an example of a SIGI external access. 
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Figure 7-29.  SIGI External Access Timing 
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The SIGI instruction can be used in a variety of ways. In some applications, 
you may wish to externally modify semaphores, perhaps with special-pur- 
pose logic. If so, SIGI can be used to perform a single-cycle interlocked ac- 
cess of the semaphore. The SIGI instruction can also be used simply to 
perform an external read and to signal that a particular point in your code 
has been reached. | 


Figure 7—30 illustrates timing for SIGI if the LOCK signal is already low. This 
could happen when a SIGI follows an LDII instruction. Since LOCK is al- 
ready low, the only effect SIGI has on LOCK is to bring it high. 
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Figure 7-30. SIG! When LOCK Is Already Low 
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7.8 TACK Timing 


The IACK pin is affected by the IACK (interrupt acknowledge) instruction. 
The timing of the pin is similar to that of the LOCK pin when used by the SIGI 
instruction. In all respects (timing, extension with wait states, etc.) the IACK 
behaves like aLOCK or STAT signal. The only difference is that there is only 
one IACK pin. 


The timing for the [ACK pin is shown in Figure 7-31. Like the interlocked in- 
structions, the [ACK instruction affects IACK only for an external access. 
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IACK Timing 


Figure 7-31. 
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This chapter provides technical information for the communication ports of 
the TMS320C40 digital signal processor (DSP). This chapter is divided into 
the following major sections: 


Section Page 
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Introduction 


A parallel processor system supports optimum system performance by dis- 
tributing tasks between two or more processors. This sharing of tasks be- 


tween two or more TMS320C40 DSPs requires that each be able to pass the 


results of its work to another; passing of results enables both DSPs to con- 
tinue working. Processor-to-processor communication is critical in multipro- 
cessor-system design. 


High-performance multiprocessing requires rapid transfer of data between 
processors. To ensure this rapid transfer of data, the TMS320C40 provides 
the following: 


[1 Shared memory — The ’C40 global- and local-memory interfaces 
enable easy construction of efficient multiprocessor-based shared 
memory systems. 


1 High-speed communication ports — The 'C40’s six high-speed bidi- 
rectional communication ports provide rapid processor-to-processor 
communication on six dedicated communication interfaces. 


Although memory sharing has advantages in some applications, a shared 
bus seriously limits processor communication bandwidth for many applica- 
tions. Using the high-speed communication ports eliminates this obstacle. 
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Communication Port Features 
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8.2 Communication Port Features 


Key features of each TMS320C40 communication port: 

Li 160-megabit per second (5-megaword per second) bidirectional data 
transfer operations (at 40-ns cycle time) 

() direct (glueless) processor-to-processor communication via eight data 
lines and four control lines 

(1 buffering of all data transfers, both input and output 


(i automatic arbitration and handshaking to ensure communication syn- 
chronization 


[1 synchronization between the CPU or direct- -memory access (DMA) 
coprocessor and the six communication ports via internal interrupts and 
internal ready signals 


[J support of a wide variety of multiprocessor architectures, including 
rings, trees, hypercubes, bidirectional pipelines, two-dimensional 
Euclidean grids, hexagonal grids, and three-dimensional grids. 


8-3 


Communication Port Features 


Figure 8-1. _ Communication Port Block Diagram 
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8.3 Operational Overview 


The ’C40 contains six identical high-speed communication ports, each of 
which provides a bidirectional communication interface to an external de- 
vice. Figure 8—1 shows the internal architecture of a single communication 
port. Each port contains the following components: 


(i Input FIFO channel — provides an 8-level, 32-bit wide first-in-first-out 
(FIFO) input buffer that isolates the C40 from the port communication 
data bus and buffers data received from an external device via the bus. 


1) Output FIFO channel — provides an 8-level, 32-bit wide FIFO output 
buffer that Isolates the 'C40 from the port communication data bus and 
buffers data to be sent to an external device via the bus. 


L) Portarbitration unit (PAU) — handles the arbitration tasks associated 
with the movement of data between a ’C40 and an external device via 
the port communication data bus. Signals arbitrated and controlled by 
the PAU are shown in Figure 8-2. The PAU is described in detail in sub- 
section 8.5.1 on page 8-12. 


{)) Communication port control register (CPCR) — allows you to con- 

trol the communication port functions and data transfer operations be- 

tween a ’C40 and an external device via the communication port data 
bus. 


Figure 8-2. | TMS320C40 Communication-Port Interface-Connection Example 


Figure 8—2 is an example of two’C40 DSPs connected via their communi- 
cation ports. This simple communication interface consists of the following 
bidirectional control and data lines: 


1 CREQx— communication port token request. A’C40 activates this sig- 
nal to request the use of the communication port data bus. 
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2 CACKx — communication port token acknowledge. A ’C40 activates 
this signal to relinquish ownership of the communication port data bus 
upon receiving a CREQx from another ’C40. 


[i CSTRBx— communication port strobe. A sending ’C40 activates this 
signal to indicate that it has placed valid data on the communication port 
data bus. 


[ +CRDYx— communication port ready. A receiving 'C40 activates this 
signal to indicate that it has received data via the communication port 
data bus. 


[i CxD(7-0) — communication port data bus. This bus carries data 
bidirectionally between two ’C40s or between a ’C40 and some other 
device. 


Figure 8-2 shows two ’C40s connected via their communication ports. The 
communication port data bus, CD(7-0), and its associated control signals 
transfer data in either direction between ’'C40s A and B. The PAUSs in the two 
’C40s cooperate to generate the signals and control sequences necessary 
to ensure orderly data transfers at the highest possible rate. To avoid con- 
flicts on the bus, these PAUs arbitrate bus ownership, allowing only one 
DSP to transmit at any given time. Either of the PAUs can relinquish bus 
ownership when the other DSP has data to send. 


Signals CREQx and CACKx handle the handshaking arbitration between 

the two DSPs: 

1) The PAU that does not own the data bus (CxD(7-0)) activates CREQx 
to request bus ownership. 


2) The PAU owning the bus then activates CACKx to acknowledge the re- 
quest and relinquish bus ownership to the requesting PAU. 


3) Inthis manner, these signals transfer a token (or priority) from one PAU 
to another, and the PAU receiving the token gains ownership of the bus. 


During a data transfer operation: 


1) The CPU or DMA coprocessor of the sending DSP writes data to the 
output FIFO (of a communication port) viaa memory-mapped address 
(listed in Figure 8-3). 


2) The communication port then places the data on CxD(7-0) and acti- 
vates CSTRBx to signal the receiving communication port that the bus 
contains valid data. 


3) Upon receiving the data in its input FIFO, the receiving communication 
port activates CRDY*x to indicate that it has received the data. 


4) The CPU or DMA coprocessor of the receiving DSP may then read the 
data from the input FIFO via a memory-mapped address (listed in 
Figure 8—3). 
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Each of the input and output FIFOs can buffer a maximum of eight 32-bit 
words. 


Buffering provided by the input and output FIFOs is very important. This 
buffering allows for a high degree of decoupling of computation and commu- 
nication overhead. When ’C40s A and B are connected via their communi- 
Cation ports, the effective length of the FIFOs becomes 16 levels. This is be- 
cause the output path from A to B is the concatenation of the eight levels 
of the output FIFO of A with the eight levels of the input FIFO of B. This also 
applies for the output path from B to A. 
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8.4 Communication Port Memory Map and Registers 


Figure 8-3. 
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Figure 8—3 shows the memory map for the C40 communication port control 
registers (CPCRs) and their associated input FIFOs and output FIFOs. The 
lowest three addresses of each port’s 16-address block are mapped to a 
CPCR and its associated input and output FIFOs. Fields (bits) within a 
CPCR are shown in Figure 8—4. | 
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Communication Port Memory Map and Registers 


For example, the addresses for communication port 0 point to (see 
Figure 8-3): 


QO 


= 
- 


address 00010 0040h: CPCR 0 

address 00010 004th: input port register 0, FIFO level 0 
address 00010 0042h : output port register 0, FIFO level 7 
address range 00010 0043h—00010 O04Fh: reserved. 


8.4.1 Communication Port Control Registers (CPCRs) 


Figure 8—4 shows the format of a TMS320C40 CPCR, which contains con- 
trol and status bits for its associated communication port. Table 8—1 lists the 
CPCR bits and fields and describes their functions. Figure 8—3 lists the 
memory locations of the CPCRs. 


If an output port that is full is written to, the peripheral bus interface latches 
the word written. On subsequent accesses to the peripheral bus, a not ready 
is given. This condition goes away when an empty position appears in the 
output FIFO. This results in the peripheral bus input latch being transferred 
to the output buffer at the communication port. 


8.4.2 Input Port Register 


This read-only register contains the contents of position 0 of the input FIFO, 
the oldest value in the FIFO. If this register is written to, its contents remain 
unchanged. 


8.4.3. Output Port Register 
This write-only register interfaces to position 7 of the output FIFO (level 7 — 


the newest value in the FIFO). If this register is read, its contents remain 
unchanged, and the value read is undefined (garbage). 
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Figure 8-4. | Communication Port Control sciaed said 


Notes: 1.CPCRs are shown in the memory map on page 8-8. Table 8-1 describes 
the various CPCR fields. 
2. Xx = reserved bit (read/write as zero). 
3. R=read, W = write. 
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Table 8-1. | CPCR Bit Functions 


Undefined 


Port Direction. Bit determines the direction of data transfer 
2 PORT DIR 


operations for the communication port. 
e PORT DIR = 0: port is in the output mode 
e PORT DIR = 1: port is in the input mode. 


Input Channel Halt. 

e Write a 1 to ICH to halt the input channel. When the input 
channel is halted, PORT DIR is set to zero. 

e Set ICH to 0 when the input channel is to be unhalted; 
otherwise, the input channel cannot signal externally when it is 

ready to receive. 


Output Channel Halt. 

e Write a 1 to this bit to immediately halt the output channel. 
However, the communication port is still able to accept a token 
request from the input channel. 

e Set this bit to 0 to allow the output channel to transfer data. 


(Table concluded on next page) 
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Table 8-1. CPCR Bit Functions (Concluded) 


Output FIFO Level. Contents of this 4-bit field: 

e 00005 (0): indicates an empty output FIFO. 

¢ 00015 (1): through 01115 (7): indicates the number of 

full positions in the output FIFO. 

¢ 11115 (15): indicates a full output FIFO 

An empty output buffer (OUTPUT LEVEL = 00005) causes an un- 

latched, positive level-triggered interrupt (OCEMPTY = 1) to be 
eye eter sent to the CPU. When the CPU or DMA coprocessor writes to 

the empty output FIFO, OCEMPTY is set to 0, and it remains in 

that state until the buffer is again empty. An output FIFO with one 

or more empty levels also causes an unlatched, positive level- 

triggered interrupt (OCRDY = 1) to be sent to the CPU and the 

DMA coprocessor. This condition causes a READY/NOT READY 

signal to be generated when the CPU or DMA coprocessor at- 

tempts to write to the output FIFO. 

input FIFO level. Contents of this 4-bit field: 

9-12 | INPUT LEVEL 


¢ 00005 (0): indicates an empty input FIFO. 
1a-84 4 Reserved 


¢ 00015 (1): through 01115 (7): indicates the number of full 
positions in the input FIFO. 
e 11115 (15): indicates a full input FIFO. 


A full input FIFO (INPUT LEVEL = 11112) causes an unlatched, 
positive level-triggered interrupt (ICFULL = 1) to be sent to the 
CPU. When the CPU or DMAcoprocessor reads from the full input 
FIFO, ICFULL is set to 0 and remains in that state until the FIFO 
is again full. An input FIFO with one or more full levels also causes 
an unlatched, positive level-triggered interrupt (ICRDY = 1) to be 
sentto the CPU andthe DMA coprocessor. This condition causes 
a READY/NOT READY signal to be generated when the CPU or 
DMA coprocessor attempts to read from the output FIFO. 


Undefined 
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8.5 Communication Port Operation 


8.5.1. Port Arbitration Units (PAUs) 


The PAU is responsible for arbitrating between two devices to determine 
which device has possession of the communication port data bus at any 
given time. This arbitration allows the bus ownership token to be passed 
back and forth between two devices connected via their communication 
ports. During this arbitration process, the PAU is in one of the four states 
listed in Table 8—2. 


Table 8-2. PAU State Definitions 


The PAU currently has possession of the bus owner- 
ship token, and its associated communication chan- 
nel is notin use. Under this condition, the PORT DIR 
bit of the associated CPCR is 0 (output). 


The PAU currently does not have possession of the 
bus ownership token and has not requested the to- 
ken. Under this condition, the PORT DIR bit equals 1 
(input), and the OUTPUT LEVEL field equals 0 
(empty output FIFO). 


The PAU currently has possession of the bus owner- 
ship token, and its associated communication chan- 
nel is in use. Under this condition, the PORT DIR bit 
equals 0 (output), and the OUTPUT LEVEL field does. 
not equal 0). 


The PAU currently does not have possession of the 
bus ownership token but has requested the token. 
Under this condition, the PORT DIR bit equals 1 (in- 
put), and the OUTPUT LEVEL field does notequal 0. 


Figure 8—5 shows the state diagram and controlling equations for the PAU 
state transitions. The figure also includes comments describing how the 
state transitions correspond to various system-level processes. 


To place data on the communication port data bus, the PAU must arbitrate 

between: 

[J on-chip requests to output data on the communication channel data bus 
(CD(7— 0)) 

[1 external requests received via the CREQ line 

This arbitration is accomplished by passing the bus-ownership token be- 

tween PAUs associated with different communication ports. The PAU con- 


taining the token has ownership of the communication port data bus. At sys- 
tem reset, half of the communication channels associated with a particular 
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Figure 8-5. 


(Other PAU requests token; 


‘C40 have token ownership (communication ports 0, 1, 2), and the other half 
(communication ports 3, 4, 5) do not. This token passing is done via the 
CREQ and CACK lines. 


Communication Port Arbitration Unit State Diagram 


BUSRQ = 0; TOKRQ = 0 


token released and 3 
passed using CACK) (Finished a one- 
TOKRO=1 word transfer) 
7 BUSRQ=0 
BUSRQ=1 
BUSRQ=0 (Bus being 
used) 
BUSRQ= 

(Request token , sachets | 

from other PAU (Token received from 

using CREQ) other PAU over CACKk) 


BUSAK=0 


To help understand the port arbitration scheme represented in Figure 8—5, 
consider a data transfer operation from ’'C40 A to 'C40 B. The transfer be- 
gins with PAU A in state 00. and PAU B in state 019. If PAU A receives a 
request (BUSRQ = 1) from its output buffer to use the communication port 
data bus, it allows the output buffer to transmit one word immediately and 
enter state 10.. After the output buffer transmits one word, it removes the 
bus request (BUSRQ = 0), and PAU A returns to state 00>. 


lf PAU B receives a request from its output buffer to use the bus, it activates 
CREQ to request the token from PAU A. PAU A detects this request via the 
state variable TOKRQ and then activates the CACK line to transfer the bus 
ownership token to PAU B. PAU B then generates an internal bus acknowl- 
edge (BUSACK) to indicate that it has gained bus ownership. As a result of 
this token transfer operation, PAU A enters state 015, and PAU B enters 
state 10o. 
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Because a PAU always returns to state 009 after transmitting a single word, 
token passing can be accomplished by ’C40s A and B alternately transmit- 
ting single words. This process provides a fair means of bus arbitration that 
prevents either of the output buffers (A’s or B’s) from being continually 
blocked. 


lf an input buffer becomes full, it will not activate CRDY at the beginning of 
the transmission of the first byte that would overflow the buffer. This condi- 
tion prevents data transfer operations in either direction until the situation is 
resolved. This can be done by reading data from the full input buffer. 


8.5.2 Module Reset 


At system reset, the input and output channels both assume an empty state, 
causing all values in the input and output buffers to be lost. The CREQ, 
CACK, CSTRB, and CRDY signals assume an inactive (high) state and 
CxD(7-0) enters its tristate mode (see Figure 8-14 and Figure 8—15 on 
page 8-30). These signals remain in these states as long as system reset is 
active and, following system reset, the value placed on CxD(7-0) by the 
communication port that is configured for output is undefined. 


At system reset, communication ports 0, 1, and 2 assume the following 
states: 


Li PAU is reset to state 005: The PAU has possession of the bus owner- 
ship token, and the channel is not in use. 


[) ICRDY=0: The input channel is empty and is not ready to be read from. 
[) ICH =O: The input channel is not in its halted state. 


Li OCRDY = 1: The output channel is not full and is ready to be written 
to. 


[} OCH =0: The output channel is not in its halted state. 


Li PORT DIR =0: The communication port is configured for output opera- 
tion. 


() INPUT LEVEL = 0: The input channel is empty. 
L} OUTPUT LEVEL = 0: The output channel is empty. 


At system reset, communication ports 3, 4, and 5 assume the following 
states: 


1 PAU reset to state 015: The PAU does not have possession of the bus 
ownership token, and the token is not requested. 


[1 ICRDY =0: The input channel is empty and is not ready to be read from. 
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CL} ICH =0: The input channel is not in its halted state. 


C1 OCRDY = 1: The output channel is not full and is ready to be written 
to. 


[} OCH =0: The output channel is not in its halted state. 


(1 PORT DIR = 1: The communication port is configured for input opera- 
tion. 


.) INPUT LEVEL = 0: The input channel is empty. 
L} OUTPUT LEVEL =0: The output channel is empty. 


Based on these reset conditions, ports 0, 1, and 2 of one DSP should be 
connected to ports 3, 4, and 5 of the other. 


8.5.3 Halting of Input and Output FIFOs 


The halting of the input and output FIFOs of a communication channel is 
controlled by the ICH and OCH bits (input-channel and output-channel halt 
bits) of the communication port control register (Figure 8—4 on page 8-10). 
The goal of input FIFO halting is to halt the input FIFO as soon as possible, 
but without the loss of data being input. A summary of the halt/unhalted con- 
ditions is provided in Table 8—3 on page 8-16. 


When the input FIFO is halted, it will not signal a ready when the first incom- 
ing byte is received. At that point, the data transfer is frozen until the input 
FIFO is unhalted or a system reset occurs. If the input FIFO is unhalted later, 
the transfer will continue without any loss of data. 


A communication port with an FIFO that is either halted or is full and inactive 
will not acknowledge a token request. This assures that the communication 
port’s output channel remains open. 


lf a communication port’s input FIFO is halted during a token request from 
the communication port to which it is connected, then the token request is 
acknowledged before halting. 
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‘Table 8-3 Summary of Input and Output FIFO Halting 


. Won't signal ready when first byte is 
received (transfer frozen) 

. If halted after first byte is received, it 
will receive rest of word (will signal 
ready and then halt the input) 


Input halted a. Won't release token 
Output unhalted b. Will transmit data 


a. Won't transmit data 
. If halted after first byte 
Input unhalted sent, will complete word . Will receive data 


Output halted | transfer and then halt the . Will not request token 
output | 
. Will release token 


. Won't release token . Won't signal ready when first byte is 
. Won't transmit data received (transfer frozen) 
Input halted . If halted after first byte . If halted after first byte received, it will 
Output halted sent, will complete word receive rest of word and then halt the 
transfer and then halt input 
the output . Will not request token 


Output FIFO halting is analogous to input FIFO halting. Assume that DSP A 
output FIFO has OCH = 1. Then the output FIFO will be halted, based upon 
its current state. 


[1 If communication port A does not have the token, the output FIFO is 
halted, and no request is made for the token. 


LY If communication port A has the token and is currently transmitting a 
word, then after the word is transmitted, no new transfers will be begun. 


(1 If communication port A has the token and the input FIFO is not halted 
and the output FIFO is halted, then it will transfer the token when re- 
quested by communication port B. 


“1 If communication port A has the token and the input FIFO is halted and 
the output FIFO is halted, then it will not transfer the token when re- 
quested by communication port B. 


Ci When coming out of the halted state, if the communication channel still 
has the token, it may transmit data if necessary. If it needs the token, 
it will arbitrate for the token as usual. 
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8.6 Coordinating Communication Port Activity With CPU and 
DMA Coprocessors 
The communication ports support several principle modes of synchroniza- 
tion: 
[} a ready/not ready signal that can halt CPU and DMA accesses to a 
communication port 
(J interrupts that can be used to signal the CPU and DMA 


The most basic synchronization mechanism is based on a ready/not-ready 
signal. lf the DMA or CPU attempt to read an empty input FIFO or write to a 
full output FIFO, a not-ready signal is returned and the DMA or CPU contin- 
ues to read or write until a ready signal is received. The ready signal for the 
output channel is OCRDY (output channel ready), which is also an interrupt 
signal. The ready signal for the input channel is ICRDY (input channel 
ready), which is also an interrupt signal. 


Interrupts are often a useful form of synchronization. Each communication 
port generates four different interrupt signals, as listed below (interrupt traps 
for these are shown in Figure 3-8 on page 3-16): 


LJ ICFULL (input channel full) 

[1 ICRDY (input channel ready) 

[4 OCRDY (output channel ready 

L} OCEMPTY (output channel empty) 


The CPU can respond to all four of these interrupt signals. The DMA 
coprocessor can respond to the ICRDY and OCRDY interrupt signals. 8 
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Figure 8-6. 
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In order to accurately describe the timing of the operation of the communica- 
tion ports, itis important to differentiate between the internal signals applied 
to the pins and the external signalseen. All signals are buffered and can be 
placed in a high-impedance state. See Figure 8-6. 


In this discussion, internal signals applied to a buffer are identified by suf- 
fixes: 
a suffix ’a’ for processor A (for example, CSTRBa) 
a suffix ’b’ for processor B (for example, CSTRBb) 
a suffix ’ab’ for the external signal between the two connected commu- 
nication ports (for example CSTRBab and CREQab) 
a suffix followed by a single quote for the value that the processor sees 
by sampling the input pad (for example CPTRa’) 


O Oooo 


Signal-Naming Example 
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8.7.1 Timing Table and Figures 
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Table 8—4 and the timing figures that follow depict timing sequences in com- 
munication between TMS320C40s using their communication ports. 
Table 8—4 lists handshaking and communication during this intercommuni- 
cation. Steps in the table are shown by numbers in the figures. Events 1 
through 36 in the table compose a token request and token transfer se- 
quence. 
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Figure 8—7 Token Transfer Sequence (page 8-23). 
1) Atthe start, the communication port on processor A has the token and is 
idle. 
2) The communication port on processor B requests the token and, after 
receiving the token, transfers a word, one byte at a time: 
a) the first byte is bits 7-0 
b) the second is bits 15-8 
c) the third is bits 23-16 
d) the fourth is 31-24 | 
3) Once a token-requesting communication port receives the token re- 
quest acknowledge, it will always transmit a word. 
Figure 8-8 End of Token Transfer Sequence Followed by a Word Transfer 
and the Beginning of a Second Word Transfer (page 8-24). 


Figure 8-9 End of a Word Transfer Followed by a Word Transfer (page 
8-25). 


Figure 8-10 End of a Word Transfer Followed by an Idle State and Token 
Transfer (page 8-26). 


1) The communication port data bus becomes idle because the output 


FIFO on processor B is empty. 

2) The communication port on processor A requests the token, which is 
then transferred to it by the communication port on processor B. 

Figure 8—11 End of a Word Transfer Followed by an Overlapping Token 

Transfer (page 8-27). 

1) As shown, the token request is received by the communication port on 
processor B. 

2) The communication port on processor B sees the ready signal for the 
last byte of the word being transmitted. 

3) Then the communication port releases the token. 

4) However, the communication port will not release the token if the token 
request is received by the processor port B after the processor port B 
sees the ready signal for the last byte of the word being transmitted. 

5) Ifthe communication port on processor B does not have another word in 
the output FIFO to transmit, it will release the token. 


Figure 8—12 End of the Transfer of the Last Word in an Output FIFO Fol- 
lowed by an Idle Condition Until Another Word Is Available to Be Transferred 
(page 8-28). This begins with a word transfer followed by an idle state due to 
an empty output FIFO. Then a word is written to the output FIFO and trans- 
ferred. 


Figure 8-13 End of a Word Transfer Followed by a Not Ready Due to the 
Input FIFO Becoming Full, Continuing Once the Input FIFO Is No Longer 
Full (page 8-29). This shows the use of the ready line to generate wait 
States. 
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1) Inthis case, a word is transferred that fills the input FIFO of the commu- 
nication port of processor A. 
2) Atthe beginning of transmission of the next word, the communication 
- porton processor A does not signal that itis ready until the input F IFO is 
no longer full. 


Table 8-4. | Handshaking Events in Communication Port Intercommunication 


B requests the token by bringing CREQb low. 
A sees the token request when CREQa’ goes low. 


After a type 1 delay from CREQa’ falling, A acknowledges the request by onnene 
CACKa low. 


B sees the acknowledgement from A when CACKb’ goes low. 


A switches CRDYa from tristate to high on the first H1 rising after CACKa falling. 


ae 

a 

<a 

<o 

[6 | Atistates CaD(7-0) on the frst Hi rising after CACKa alin 
(7 |B switches CSTRBb from tistate to high after CACKD fling. 
= 
3 
ac a 
= 
P12 
3 
4 


B brings ae high after a type 1 delay from CACKb’ falling, 


A switches CREQa from tristate to high after CREQa’ goes high. 


B tristates CREQb after CREQb goes high. 


B switches CACKb from tristate to high after CREQb goes high. 
B tristates CRDYb on H1 rising after CREQb goes high. 


B drives the first byte onto CbD(7—0) on H1 rising after CREQb goes high. 
A sees the first byte on Ca’D(7-0 


ae 
Eee! 
eae 
B brings CSTRBb low on the second H1 rising after CREQb rising 
| 20 
at 
| 22 
| 23 
tesa 


A sees CSTRBa’ go low, signaling valid data 

A reads the data and brings CRDYa low 

B sees CRDYb’ go low, signaling data has been read 

B drives the second byte on CbD(7-0) after CRDYb’ goes low. 


A sees the second byte on Ca’D(7-0). 7 


T Event No. corresponds to numbers in the timing diagrams that follow. 
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Table Continued on Next Page 


8-20 | Communication Ports 


Communication Port Timing 


ES or LS SASS SD SR KS RS SSS SS RS a aS a SS SSS SSS SESS a RS ae OS SS aac ad 


Table 8-4. Handshaking Events in Communication Port Intercommunication (Continued) 


t Event | -— 


| 25 | Bbrings CSTRBb high after CRDYb’ goes low. 


| 28 | BscesCRDYb'gohigh, 
| 29 | Bbrings CSTRBb low after CRDYb' goes high. 


41 | Areads the data and brings CRDYa low. 

42 | Bsees CRDYb’ go low, signaling data has been read. 

43 B drives the fourth byte on CbD(7-0) after CRDYb’ goes low. 
44 A sees the fourth byte on CaD(7-0) 

45 B brings CSTRBb high after CRDYb’ goes low. 

46 A sees CSTRBa’ go high. 


[50 | A sees CSTRBa! go low, signaling validdata. SSS 
[8 brings GSTRBb high after CHDYb'goesiow. SSS 


53 B brings CSTRBb high after CRDYb’ goes low. 


T Event No. corresponds to numbers in the timing diagrams that follow. 


Table Concluded on Next Page 
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Table 8-4. Handshaking Events in Communication Port Intercommunication (Concluded) 


No. 
57 


go high. 


| 57 B drives the first byte of the next word onto CbD(7-0) after a type 2 delay from 


CRDYb' falling (52). 


tT Event No. corresponds to numbers in the timing diagrams that follow. 


These events are identified by event number in the following figures that de- 
scribe the communication port timing. 
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Figure 8-8. End of Token Transfer Sequence Followed by a Word Transfer and the Beginning of a 


Second Word Transfer 
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Figure 8-9. Endof a Word Transfer Followed by a Word Transfer 


H1 
H3 


| a 
PRE.’ ! ' ' ' ' : ' ' ' ' ' 
a ‘ ' t ' ' ; ' ' ' ‘ ' 
' ' \ t ' ' ' ' ' ! ' t ' 
! 1 ' ' t t ' ' ' 1 
’ t ' ' ' t ' 
<x a t ' ' ' ' ' ’ ' ' t 
_ ' ' ' ' t ' ' ' ' ' ' ' ' 
8 La 1 26 py 30 Spe 40! 50 py! §=—-26 
re) a’ 
o t 1 J ‘ 
CRDYa 
a’ 
oe t ' ‘ ' ' ' ' i ' ' ' t ' ' 
ea ' 118 =, 1 24, ' 4 4 : ' ' 58 


' i ' 
' ' ' ' 


oa) 
i 
© 
a” 
” 
® 
oO 
2 
QO. 


' 
' | ' $ 
' t ' ' ' ' 
1 1 ' 1 ’ t 
1 t L) ‘ 1 ‘ 4 t 4 4 ( ' 


'¢———— Word Transfer From Proc. B to Proc. A ————P 
Word Transfer From Proc. B to Proc. A <> 


8-25 


Communication Port Timing 


Figure 8-10. _ End of a Word Transfer Followed by an Idle State and Token Transfer 
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Figure 8-11. End of a Word Transfer Followed by an Overlapping Token Transfer 
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NOTE: Events Mil —E@ are complements to the description in Table 8-4 on page 8-20 (i.e., if"a” is in the 
description, substitute "b” and vice versa, — CDa’ becomes CDb’; CDb’ becomes CBa’). 
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Figure 8-12. End of the Transfer of the Last Word in an Output FIFO Followed by an Idle Condition 
Until Another Word Is Available to Be Transferred 
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Figure 8-13. End of a Word Transfer Followed by a Not Ready Due to the Input FIFO Becoming Full, 
Continuing Once the Input FIFO Is No Longer Full 
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Figure 8—14 illustrates the state of the signals of a communication port ini- 
tialized by a reset as an output port (ports 0, 1, and 2 are configured as out- 
put ports at reset). For this case, CREQ and CRDY are in a high-impedance 
state. CACK and CSTRB are high, and undefined values are on CD(7-0)- 


Figure 8-14.  Post-Reset State for an Output Port 
ml J LSJ LS LS LI LI Ly 
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Figure 8—15 illustrates the state of the signals of a communication port ini- 

tialized by a reset as an input port (ports 3, 4, and 5 are configured as input 

ports at reset). For this case, CREQ and CRDY are high. CACK, CSTRB, 
8. and CD(7-0) are all in a high-impedance state. 


Figure 8-15. Post-Reset State for an Input Port 


H1 


{JL LILI Li Li Ly 
se ee Se es Se ee ee ee ee 


CREQ ' ' ' ‘ ' ; ‘ ' : ' ' ' : 
CAC SSeS ea a eee eee ee 
CS eg a pg ee 
; 1 t _= 4 i] t | t i] 
CHOY at OU ee BO 


8-30 | Communication Ports 


Communication Port Timing 


8.7.2 Synchronizer Timing 


Figure 8-16. 


The synchronizers used in the port arbitration unit are of two types. Type- 
one synchronizers cause delays that vary from 1 to 2 machine clocks from 
the receiving of an input on a pin until the response on output pin (ignoring 
analog delays). Type-two synchronizer delays range from 1.5 to 2.5 ma- 
chine clocks delay. 


Type-one synchronizers recognize an input when H1 is high, then pass it 
through an H3-high/H1-high series of delays. The response is at the start 
of the following H3 high. 


The minimum type-one synchronizer delay of one machine clock will occur 
when the input changes just before H1 goes low. This delay is shown in 
Figure 8—16. 


The maximum type-one synchronizer delay of two machine clocks will oc- 
cur when the input changes just after H1 goes low. This delay is shown in 
Figure 8—17. 


Type-One Synchronizer Minimum Delay 


Input! 
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Type-two synchronizers first recognize an input when H1 is “ then 
pass it through an H3-high/H1-high/H3-high series of delays. The re- 
sponse is at the start of the following H1 high. — 


The minimum type-two synchronizer delay of 1.5 machine clocks occurs 
when the input changes just before H1 goes low. This delay is shown in 
Figure 8—18. 


The maximum type-two synchronizer delay of 2.5 machine clocks occurs 
when the input changes just after H1 goes low. This delay is shown in 
Figure 8—19. 


Using these two types of synchronizers, the synchronizer delays for the 
communication port signals are tabulated in Table 8—5. 


Figure 8-18.  Type-Two Synchronizer Minimum Delay 
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Figure 8-19. Type- Two Synchronizer Maximum Delay 
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Table 8-5. | Communication Port Signals and Synchronizer Delays 


Delay Min. Delay Max. Delay 
Input Signal to Output Signal ee _teocevaes clock cycles) Jeers clock cycles) 
|CREQ! toCACKL |CREQ! toCACKL CACKL 


CACKI 10 CREO [a aa oe ne Se 
CRDY! to CD valid for a new word 
CACKI 10 CSTAB active es 


CRDYJ to CSTRBJ between | | 
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This chapter provides technical information for two important TMS320C40 
(C40) functions: the direct memory access (DMA) coprocessor and the tim- 
ers. Both are on-chip parts of the ’C40 digital signal processor (DSP). The 
first nine major sections of this chapter cover the DMA coprocessor; the last 
section covers the timers. 


Section | Page 
9.1 ~Introduction ................ Saga wae d Gnawaanaad wae aie een Gee 9-2 
9.2 DMA Coprocessor Functional Description ................ 9-3 
9.3  DMACoprocessor Registers ............cc cece cece nee 9-7 
m@ DMA Channel Control Register ..................006- 9-7 
m DMA Channel Address and Index Registers .......... 9-16 
m DMA Channel Transfer-Counter Register ............ 9-18 
m DMA Channel Link-Pointer Register ................. 9-19 
9.4 DMA Coprocessor Channels in Unified and Split Mode ..... 9-20 
9.5 DMA Coprocessor Internal Priority Schemes 
9.6 CPU and DMA Coprocessor Arbitration .................- 
9.7 Data Transfer Modes ............ 0. ccc cece eee ene re. 
9.8 Autoinitialization ....... 0.0... ccc cc cece eee ees 
9.9 DMA Coprocessor and Interrupts ..............0 eee eee 
9.10 TMS320C40 Timers ........ 00... cece eee eens 


Note: 


DMA Programming Examples in Chapter 12 


Besides the descriptions of DMA operation in this section, programming 
examples and explanations are provided in Chapter 12. | 
ee ee ee a ee ee ee ee Se NE ee TTT | 
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9.1 Introduction 


The primary benefit of the DMA coprocessor is to maximize sustained CPU 
performance by completely alleviating the CPU of burdensome I/O duties. 


The DMA coprocessor supports six DMA channels that perform transfers 
to and from anywhere in the processor’s memory map. For example, trans- 
fers can be made to/from on-chip memory, off-chip memory, and any of the 
six on-chip communication ports. The DMA coprocessor can automatically 
reinitialize its registers via linked lists stored in memory, allowing the DMA 
to run continuously without any intervention by the central processor unit 
(CPU). The DMA coprocessor can build up circular buffers in memory and 
perform linear and bit-reversed addressing. 


The DMA coprocessor provides you with an unprecedented level of per- 
formance and flexibility fora DSP on-chip DMA coprocessor. The key fea- 
tures of the ’C40 DMA coprocessor are: 


[1 six DMA channels for memory-to-memory transfers under unified 
mode; a special split mode supporting 12 DMA channels for communi- 
cation port to/from memory transfers 


LI autoinitialization of DMA channel control registers, via linked lists stored 
in memory, at the start of a block transfer 


[concurrent CPU and DMA coprocessor operation with DMA transfers 
at the same rate as the CPU (supported by separate internal DMA ad- 
dress and data buses) 


[1 source and destination address registers with variable indices allowing 
stepping through matrices by row or column 


{)  bit-reversed addressing for FFTs 


Ll synchronization of data transfers via external and internal interrupts 
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9.2 DMA Coprocessor Functional Description 


The TMS320C40 DMA coprocessor improves data transfer rates in sys- 
tems that must perform: 


Ci memory to memory transfers 

(i data transfers from an I/O device to memory 

[1 data transfers from memory to an I/O device 

(i transfers of data between the on-chip communication ports and 
memory. | 

 datatransfer of a single value to a block of memory for memory fill and 
initialization. 


The DMA coprocessor can transfer data in a linear fashion or in a bit-rev- 
ersed fashion for FFT applications; it can transfer matrix data in a row or col- 
umn fashion. 


The DMA coprocessor is a self-programming device that allows data trans- 
fers to occur without any intervention from the CPU. This allows data to be 
moved onto and off of the ’C40 without any CPU distraction. The result is 
a processor which has aconcurrent I/O rate that can keep up with the CPU’s 
high computation rate. The address map of the DMA coprocessor registers 
is shown in Figure 9—1. The major registers of the DMA coprocessor are: 
control register 

source register 

source index register 

destination register 

destination index register 

transfer counter register 

link pointer register 


OOoooOoo 


Subsections that describe these are listed in Figure 9—2 and in Section 9.3. 


The DMA coprocessor has dedicated on-chip DMA address and DMA data 

buses. All accesses made by the six DMA channels are arbitrated in the PQ 
DMA coprocessor and take place over these dedicated buses. The DMA 

channels can run constantly or may be triggered by an external or internal 

interrupt, including an interrupt peleiaes by the on-chip timers and com- 
munication ports. 


DMA Functional Description 
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Figure 9-1. | DMA Coprocessor Memory Map 
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Figure 9-2. Subsections Where DMA Channel Registers Are Described 


Memory Described 
Address DMA Register in Section On Page 
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010 00zih 
010 00z2h 
010 00z3h 
010 00z4h 
010 00z5h 
010 00z6h 
010 00z7h 
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| 9.3.1 9-7 
9.3.2 9-16 
9.3.2 9-16 


DMA 9.3.3 9-18 
Ch. 9.3.2 9-16 
X 9.3.2 9-16 


| 9.3.4 9-19 
9.3.3 9-18 
9.3.4 9-19 


= Channel number (e.g., all are 1 for channel 1, all 2 for channel 2, etc.). 


Z= corresponding hexadecimal digit for channel address (e.g., substitute 
“A” for DMA channel 0; “B” for DMA channel 1, etc. See Figure 9—1). 


For example, if a block of data is to be transferred from one region in memory 

to another region in memory: 

1) The source address register of a DMA channel is loaded with the ad- 
dress of the source memory location. 

2) The destination address register of the same DMA channel is loaded 
with the address of the destination memory location. 

3) The transfer counter is loaded with the number of words to be trans- 
ferred. 

4) If sequential memory accesses are required, the source address in- 
dex register as well as the destination address index register would 
be set to 1. _ 

5) The appropriate modes can be set up to synchronize the DMA 
coprocessor reads and writes to interrupts via the DMA channel control 


register 
6) Then, the DMA coprocessor can be started via the DMA START field in Qo 
the DMA channel control register. 


A DMA transfer consists of two steps: 

1) The source data value is read by the DMA channel and stored ina 
temporary register. 

2) The temporary register value is written to the destination address. 

During every data write, the transfer counter is decremented. The block 

transfer can be terminated when the transfer counter goes to zero and the 

write of the last transfer is complete. 


After a read by the DMA channel, the source-index register is added to the 
source-address register. After a write by the DMA channel, the destination- 
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index register is added to the destination-address register. (Both index 
registers contain signed values.) This allows for variable step sizes or con- 
tinual reads and/or writes from/to memory. In the case of an index register 
equaling zero, the DMA coprocessor transfers data from/to a fixed location. 


At the completion of a block transfer, the DMA coprocessor can be pro- 
grammed to do several things: | 


Ci most importantly, autoinitialize itself at the start of the next block trans- 
fer. Each DMA channel can read new control register values from mem- 
ory (as well as the other registers in Figure 9—2), load these values into 
its register block, and, according to the values loaded, begin another 
block transfer. This autoinitialization is done without any intervention 
by the CPU. | 


[J generated an interrupt to signal that the block transfer is complete 
[) stop until reprogrammed 


A special split-mode allows the DMA channels to have the source and desti- 
nation paths split and bound to a communication port. In this mode, the 
DMA-channel source path (source-address register, source-index regis- 
ter, transfer-counter register, and link-pointer register) forms the primary 
split channel and is used to move data from a location in the processor’s 
memory map to a communication port. The DMA-channel destination 
path (destination-address register, destination-index register, auxiliary 
transfer-counter register, and auxiliary link-pointer register) is the auxiliary 
split channel and is used to move data from the same communication port 
to a location in the processor’s memory map. 


Note: DMA Coprocessor Programming Examples in Chapter 12 


Besides the descriptions of DMA coprocessor operation in this section, pro- 
gramming examples and explanations are provided in Chapter 12. 
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9.3 DMA Coprocessor Registers 


The DMA coprocessor has nine registers designated as follows (for loca- 
tion, see Figure 9-2 on page 9-5): 


OO OO OF 0 OOC 


DMA channel control register (subsection 9.3.1) 

DMA-channel source-address register (subsection 9.3.2, page 9-16 ) 
DMA-channel source-address-index register (subsection 9.3.2, page 
9-16 ) 

DMA-channel destination-address register (subsection 9.3.2, page 
9-16 ) 

DMA-channel destination-address-index register (subsection 9.3.2, 
page 9-16 ) 

DMA-channel transfer-count register (subsection 9.3.3 on page 9-18) 
DMA-channel auxiliary-transfer-count register peUpegeton 9.3.3 on 
page 9-18) 

DMA-channel link-pointer register (subsection 9.3.4 on page 9-19) 
DMA-channel auxiliary-link-pointer register eupeear 9.3.4 on page 
9-19) 


Each DMA channel has one of each of these registers, discussed in the fol- 
lowing paragraphs. 


9.3.1 DMA Channel Control Register 


The format of the DMA channel control register is shown in Figure 9-3. 
Table 9—1 defines the register bits, the bit names, and the bit functions. 


At reset, the DMA channel control register is set to zero. This makes the 
DMA channel lower priority than the CPU, sets up the source address and 
destination address to be calculated via linear addressing, and configures 
the DMA channel in the unified mode. 


When an external interrupt is used for DMA coprocessor transfer synchroni- 
zation, the CPU is responsible for configuring external interrupts as edge- 
or level-triggered interrupts (as set in the applicable FUNCx and TYPEx bits 
of the interrupt flag register (subsection 3.1.10 on page 3-12)). 
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Figure 9-3. | DMA Channel Control Register 


31 30 2g as as a eo 24 23 22 21 20 19 18 


RW R R RW-A RW R R RWSA RWS 


RW-A RW-A RW-A RWSA | RWSA RWS RWSA RWS 


RWSA RWS RWSA RWS RWSA RWSA RWS RWS RWS RWS 


— Bit may be read. 

Bit may be written. 

Bit is shadowed during autoinitialization (no changes take 
affect until autoinitialization is complete.) 

Bit is auxiliary for autoinitialization. 

— Reserved. 


x > OST 
| 


- Table 9-1. DMA Channel Control Register Bit Definitions 


DMA coprocessor priority. Defines the arbitration rules 

o—1] DMA PRI to be used when a DMA channel and the CPU are re- 
questing the same resource. Affects all DMA coproces- 
sor modes. Rules listed in Table 9-2, page 9-14. 


| Defines the transfer mode used by the DMAchannel. Af- 
2-3 | TRANSFER MODE fects unified mode and the primary channel in split mode. 
Bits defined in Table 9-3 on page 9-14. 


Defines the transfer mode used by the DMA channel. Af- 
fects the auxiliary channel in split mode only. Bits defined 
in Table 9-3 on page 9-14. 


4-5 AUX TRANSFER 
MODE 


Table continued on next page 
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Table 9-1. DMA Channel Control Register Bit Definitions (Continued) 


Determines the mode of synchronization to be used 
when performing data transfers, as shown in Table 9-4 
on page 9-15. 


Note: Ifa DMA channel is interrupt driven for both reads 
and writes, and the interrupt for the write comes before 
the interrupt for the read, the interrupt for the write is 
latched by the DMA channel. After the read is complete, 
the write can be executed. 


If bit = 0, the link pointer is incremented during 
; autoinitialization. 

If bit = 1, the link pointer is not incremented (it is static) 

during autoinitialization. 

This affects unified mode and primary channel in split 
mode. It is useful to keep the auxiliary link pointer con- 
stant when autoinitializing from the on-chip communica- 
tion ports or other stream-oriented devices (such as first- 
in first-out (FIFO) memory buffers). 


Acts the same as for the AUTOINIT STATIC mode 
above, except that this affects the auxiliary channel in 
split mode only. 


Has an effect only in the DMA coprocessor sync mode 
(bits 6-7 above). Affects the interrupt that is enabled by 
the DMA interrupt enable register (see miguie 3—4, page 
3-8) used for DMA reads: 
If bit= 0, the interrupt is ignored, and the 
autoinitialization reads are not synchronized 
with any interrupt signals. 
If bit = 1, then the interrupt is recognized and is 
also used to synchronize the autoinitialization 
reads. 
This affects the unified mode and the primary channel in |. 
split mode (see bit 14, SPLIT MODE). The effect of this 
bit and the SYNC MODE bit in autoinitialization is sum- 
marized in Table 9-9 on page 9-37. 


Acts the same as the AUTOINIT SYNC bit above except 
that it affects DMA-coprocessor write autoinitialization 
sync in unified mode and the auxiliary channel in split: 
mode. The effect of this bit and the SYNC MODE bits in 
autoinitialization is summarized in Table 9-9 on page 
9-37. 


6—7| SYNC MODE R/W 


AUTOINIT STATIC R/W 


AUX AUTOINIT 
ce STATIC R/W 


10 | AUTOINIT SYNC R/W 


AUX AUTOINIT 


SYNC 


Table continued on next page 
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Table 9-1. DMA Channel Control Register Bit Definitions (Continued) 


Read/ ue 
i [_womone [mg | owerpton 


| If bit = 0, the source address is modified using 
| | 32-bit linear addressing. 
12 READ BIT REV R/W 
WRITE BIT REV 


If bit = 1, the source address is modified using 24-bit 
bit-reversed addressing. 
_ 
15-17 | COM PORT 


Affects unified mode and primary channel in split mode. 
Table continued on next page 


If bit = 0, the destination address is modified using 
32-bit linear addressing. 

If bit=1, the destination address is modified using 
24-bit bit-reversed addressing. 

Affects unified mode and auxiliary channel in split mode. 


This controls the DMA coprocessor mode of operation. 

If bit = 0, DMA transfers are memory to memory. This is 
referred to as unified mode. 

If bit = 1, split mode is entered with the DMA split into 
two channels, allowing a single DMA channel to 
perform memory-to-communication-port and co 
mmunication-port-to-memory transfers. 

The split mode may be modified by autoinitialization in 

unified mode or by autoinitialization by the auxiliary 

channel in split mode. ial mode is further described in 

Section 9.4. 


These bits define a communication port (0005 to 1015) | 
to be used for DMA transfers. 
If SPLIT MODE = 0, then COM PORT has no affect on 
the operation of the DMA channel. 
lf SPLIT MODE = 1, then COM PORT defines which of 
the six communication ports to use with the 
DMA channel. 
The COM PORT may be modified by autoinitialization in 
unified mode or by autoinitialization by the auxiliary 
channel in split mode. 
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Table 9-1. DMA Channel Control Register Bit Definitions (Continued) 


Read/ ae 
jos. | Mnemonic [ffte. | Daserpton 


Transfer counter interrupt control. 

If TCC = 1,a DMA channel interrupt pulse is sent to the 
CPU after the transfer counter makes a transition 
to zero and the write of the last transfer is complete. 
lf enabled, the corresponding DMA interrupt (DMA 
INTO—INT5) occurs as shown in Figure 3-8, 

18 TCC R/W p. 3-16. 
If TCC = 0, a DMA channel interrupt pulse is not 
sent to the CPU when the transfer counter makes 
a transition to zero. 

Affects unified mode and the primary channel in split 

mode. DMA channel interrupts to the CPU are edge 

triggered. 


Auxiliary transfer counter interrupt control. 

If bit = 1, a DMA channel interrupt pulse is sent to the 
CPU after the auxiliary transfer counter makes a 
transition to zero and the write of the last transfer 
is complete. If enabled, the corresponding DMA 
interrupt (DMA INTO-INT5) occurs as shown in 
Figure 3-8, p. 3-16. 

If bit = 0, a DMA channel interrupt pulse is not 

- genttothe CPU when the auxiliary transfer counter 

makes a transition to zero. | 

Affects the auxiliary channel in split mode only. 


19 | AUX TCC 


— . 


Table continued on next page 


Transfer counter interrupt flag. 
This flag is set to 1 whenever the transfer counter makes 
a transition to zero and the write of the last transfer is 
completed. Whenever the DMA channel control register 
is read, this flag is cleared unless the flag is being set by 
the DMA in the same cycle as the read (in such case, 
TCINT is not cleared). 

The TCINT FLAG is affected by the unified mode and the 
primary channel in split mode. 
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Table 9-1. DMA Channel Control Register Bit Definitions eee 


Read/ 
arvce | toonone [ig] Dmeinton 


Auxiliary transfer counter interrupt flag. 
This flag is set to 1 whenever the auxiliary transfer 
AUX TCINT 
FLAG 
- 


counter makes a transition to zero and the write of the 
24 —25 | AUX START R/W 


last transfer is completed. Whenever the DMA control 
26-27 | STATUS |: 


register is read, this flag is cleared unless the flag is be- 
Table concluded on next page 


ing set by the DMA coprocessor in the same cycle as the 
read (in such case AUX TCINT is not cleared). The AUX 
TCINT FLAG is affected by the auxiliary channel in split 
mode. 

Since only one DMA-channel interrupt is available for a 
DMA channel, you can determine what event had set the 
interrupt by examining the TCINT FLAG and the AUX 
TCINT FLAG. 


Starts and stops the DMA channel in several different 
ways (listed in Table 9—5, page 9-15). START affects 
the unified mode and the primary channel in split mode. 
If used to hold a channel in the middle of an autoinit se- 
quence, the START and AUX START bits will hold the 
autoinit sequence. 

If the START or AUX START bits are being modified by 
the DMA channel (for example, to force a halt code of 
105 on a transfer-counter terminated block transfer) and 
a write is being performed by an external source to the 
DMA channel control register, internal modification of 
the START or AUX START bits by the DMA channel 
has priority. See TRANSFER MODE bits value of 0 15, 
(Table 9-3,). 


Starts and stops the DMA enannel in several different 
ways (listed in Table 9-5, page 9-15) AUX START af- 
fects the auxiliary channel in split mode only. 


Indicates the status of the DMA channel as listed in 
Table 9-6, page 9-16. STATUS is updated in the unified 
mode and by the primary channel in the split mode. Up- 
dates are done every cycle. 

The STATUS and AUX STATUS bits (Table 9-6) are 
used to determine the current status of the DMA chan- 
nels and to determine if the DMA channel has halted or 
has been reset after writing to the START or AUX START 
bits. 
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Table 9-1. DMA Channel Control Register Bit Definitions (Concluded) 


Read/ , 
ee [ ween [| On 
Indicates the status of the DMA channel as listed in 
Table 9-6, page 9-16. STATUS is updated by the auxil- 
28-29 | AUX STATUS iary channel in split mode only. Updates are done every 
cycle. 
Priority mode of DMA channel access: 
0 = Rotating priority as shown in Section 9.5 (on 
a lee RW page 9-22). 
1 = Fixed priority as shown in Section 9.5. 
; This bit is available only at DMA channel 0 (zero). 


ptf Reserve 
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Table 9-2. | DMA PRI Bits and CPU/DMA Arbitration Rules 
DMA PRI 
as ee 
DMA coprocessor access is lower priority than the CPU access. If the 
DMA channel and the CPU are requesting the same resource, then the |, 
CPU will proceed. Bits are set this way at reset. 


lf the DMA channel and the CPU are requesting the same resource, 
then the CPU will proceed. Then, after the CPU access is complete, if 

0 , the DMA coprocessor and CPU are again requesting the same re- 
source, the DMA coprocessor will proceed. This priority rule provides a 
fair arbitration scheme by alternating CPU accesses with a DMA chan- 
nel’s access. 


Reserved. 

DMA coprocessor access is higher priority than the CPU access. If the 
DMA channel and the CPU are requesting the same resource, then the 
DMA will proceed. 


Table 9-3. | TRANSFER MODE and AUX TRANSFER MODE Field Description 
TRANSFER MODE 
3-2 


Transfers are not terminated by the transfer counter, and no 
autoinitialization is performed. TCINT (transfer counter interrupt) can still 
be used to cause an interrupt when the transfer counter makes a transi- 
tion to zero. The DMA channel continues to run. Note that the address 
continues incrementing while the transfer count rolls over to its maxi- 
mum value of OFFFF FFFFh. 


Transfers are terminated by the transfer counter. No autoinitialization is 
performed. A halt code of 105 is placed in the START field when trans- 
fers are completed. 


Autoinitialization is performed when the transfer counter goes to zero 
without waiting for CPU intervention. 


The DMA channel is autoinitialized when the CPU restarts the DMA 
coprocessor by using the DMA register in the CPU. When the transfer 
counter goes to zero, operation is halted until the CPU starts the DMA 
coprocessor by using the START field in the DMA channel control 
register (bits 22-23 and 24—25, Table 9-5). A halt code of 109 is placed 
in the START field by the DMA coprocessor. 
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Table 9-4. | SYNCH MODE Field Description 


7-6 me 


No synchronization. Interrupts are ignored. 


0 , Source synchronization. A read will not be performed until an enabled 
interrupt occurs. 


, 0 Destination synchronization. A write will not be performed until an enabled 
interrupt occurs. 


Source and destination synchronization. A read is performed when an en- 
abled interrupt occurs. Then, a write is performed when an enabled inter- 
rupt occurs. The interrupts used are specified by the DMA READ and DMA 
WRITE fields of the DMA interrupt enable (DIE) register (subsection 3.1.8 
on page 3-8). 


Table 9-5. | START and AUX START Field Description 


DMA channel reset. DMA channel read or write cycles in progress are com- 
pleted (not aborted); any data read is ignored. Any pending (not started) 
read or write is canceled. The DMA channel is reset so that when it starts, 
a new transaction begins; that is, a read is performed. In this start mode, 
stopping is immediate with no other registers loaded. 


Halts the DMA channel on the first available read or write boundary. If the 
read or write has begun, the read or write is completed before stopping (i.e., 
in the middle or at the end of aDMA channel transfer). If a read or write has 
notbegun, no read or write is started. In this start mode, stopping is immedi- 


ate with no other registers loaded). 


Halts the DMA channel on the first available transfer boundary. If a DMA 
transfer has begun, the entire transfer is completed, including both cycles 
(both read and write operations), before stopping. If a transfer has not be- 
gun, none is started. In this start mode, stopping is immediate with no other 
registers loaded. 


DMA start. Writing 115 to this field starts the DMA process using the values 
in the channel’s DMA channel registers (Figure 9-1). If the DMA is in auto- 
initialization, all DMA registers are loaded before starting the operation. 
The DMA coprocessor starts from reset if previously reset (START bits = 
00>) or restarts from the previous state if previously halted (START bits = 
01 9 OF 105). 
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Table 9-6. © STATUS and AUX STATUS Field Description 


DMA process is in the middle of the DMA transfer (between the write and 
read operations). This is the value at RESET, after a halt on a transfer 


boundary, or after a block transfer. 


DMA process is being held (for any reason) in the middle of a DMA transfer; 
that is, in the middle of the read/write operation. 


Reserved. 


DMA channel is not being held or reset. 


9.3.2 DMA Channel Address and Index Registers 


As shown in Figure 9-4, both the DMA coprocessor source-address and 
destination-address registers have an associated index register. After each 
DMA channel read (source address) or write (destination address), the 
corresponding (source or destination) address generator adds the index 
register to the address register and places the result in the address register. 
In this way, the address register acts as accumulator because it retains the 
sum of itself and its index register. 


Address Register + Index Register + Address Register 
The values in these registers are undefined at reset. 


Depending upon bits 12 and 13 (READ BIT REV and WRITE BIT REV) of 
the DMA channel control register, the addition may be either: 


() linear (normal addition): READ BIT REV =0 or WRITEBIT REV=0, 
or 


[Li bit reversed (reverse carry propagation): READ BIT REV = 1 or 
WRITE BIT REV = 1. 


Both index values (source or destination) are signed values. 
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Figure 9-4. | DMA-Coprocessor Address Generation 
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(a) Source Address Register Operation 
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Write Bit-Reverse Bit 
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Dest. Address 0 
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(b) Destination Address Register Operation 
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9.3.3 DMA Channel Transfer-Counter and Auxiliary-Transfer-Count 
Registers | 


Figure 9-5. 
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These registers contain the number of words to be transmitted. 


Figure 9-5 shows the six transfer counters and the six auxiliary transfer 
counters. Each auxiliary transfer counter is used when the DMA channel is 
in split mode (described in Section 9.4 on page 9-20). The values in these 
registers are set to zero at reset. 


The counters are decremented after completing the address fetch for the 
write portion of a transfer. The TCINT FLAG and AUX TCINT FLAG (bits 20 
and 21 of the DMA channel control register, Figure 9-3 on page 9-8) are 
not set unti/the counter is decremented and the write of the last transfer is 
completed. Correspondingly, the interrupt will not be seen by the CPU inter- 
rupt controller until the transfer counter is decremented and the write of the 
last transfer is completed. 


The decrementer checks for equality with zero after the decrement is per- 
formed. As aresult, ifthe count register has a value of 1, thenthe DMA chan- 
nel can be halted after only one transfer is performed. The count is treated 
as an unsigned integer. Transfers may be halted when a zero count is de- 
tected after a decrement. Ifthe DMA coprocessor channel is not halted after 
the transfer reaches zero, the counter will continue decrementing below 


(ZeP0. 


DMA Coprocessor Transfer-Count Registers 


“Transfer Counter x T 
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T x = DMA channel number (0-5) - 


_DMA Coprocessors and Timers 


ters — Link Pointer Register 


REESE ESL ee eatata i eelonaeesatatelelototesasanetetesstococssoenssteletetezosetocassestonssesenstosoeononssoasseistatesetetetenenonecsssssssnsnssesetoceserstansnononatstetetoterscssonssstayatesetoeateraneteteteteseresetesetensssraterete 


9.3.4 DMA-Channel Link-Pointer and Auxiliary-Link-Pointer Registers 


The link pointers specify the address from which to load the new DMA chan- 
nel register values when autoinitialization is performed. When a channel 
has exhausted its counter, it will (if appropriately configured) use the link 
pointer to reload itself. Figure 9-6 illustrates the DMA coprocessor link ad- 
dress registers. The values in these registers are undefined at reset. 


For example, under autoinitialization, the steps to load the channel registers 

for DMA channel 0 (as shown in Figure 9—1 on page 9-4) would be: 

1) Getlink pointer for next DMA operation. Pointer is memory address con- 
taining contents of first DMA channel 0 register (channel control regis- 
ter as shown in Figure 9—1 on page 9-4). | 

2) Bring in contents pointed to by pointer and write to address 010 OOAOh 
(first word of DMA channel 0 registers as shown in Figure 9—1). 

3) Increment link pointer. (Skip this step if AUTOINIT STATIC bit = 1.) 

4) Bring in next word and write to address 010 OOATh. 

5) Repeat until entire block of registers is loaded for DMA channel 0. 


Figure 9-6. | DMA Coprocessor Link Pointer Registers 
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9.4 DMA Channels in Unified and Split Modes 


Unified and split mode are depicted in separate diagrams (Figure 9-7 and 

Figure 9-8 on the next page). The split mode transforms one DMA channel 

into two DMA channels: 

LJ Primary Channel: one dedicated to reading data from a location in the 
memory map (external/internal) and writing it to a communication port 

1 Auxiliary Channel: one dedicated to receiving data from a communi- 
cation port and writing it to a location in the memory map 


To accommodate the six communication ports, all six DMA channels can 
support this split mode (DMA channels O—5). 


The SPLIT MODE bit (bit 14 of the DMA channel control register, 
Figure 9-3) controls the DMA unified or split mode: 

(1 For unified mode (Figure 9-7): Set SPLIT MODE bit to 0 (zero) 

[1 For split mode (Figure 9-8): Set SPLIT MODE bit to 1 


The COM PORT field of the DMA channel control register (bits 15-17 as 

shown in Figure 9—3) defines which communication port is used (port O—5). 

Figure 9-8 shows typical operations using one communication port. 

j] The transfer counter register controls the primary channel transfers. 

1 The auxiliary transfer counter register controls the auxiliary channel 
transfers (both these registers shown in Figure 9-1, page 9-4). 


DMA channel arbitration in split mode is described in subsection 9.5.3 on 
page 9-24. 
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Figure 9-7. Typical Unified Mode DMA Channel Configuration 


Memory Pointed to by DMA 
Source Address Register 


Memory Pointed to by DMA 
Destination Address Register 


Figure 9-8. __ Typical Split-Mode DMA Configuration 


Memory Pointed to by DMA 


Source Address Register 
PRIMARY 


CHANNEL 


AUXILIARY 
CHANNEL 


N 


CREQ 
CACK 
CSTRB 
CRDY 


CD(7—0) 


9-21 


s e 
DMA Internal Priority Schemes 
Senegalese teinnsasannnaseeneenenessmnenen usenet eetemeReNmenen mene eeeseeeeen ele an ected eede teeta escaece cn eceece ca ete eceear aed eneeedn eee eect ace enc ebea ea en eae enenaetetee te 


SESS A SSN SS Sis aoa SSS ee So ssa se RAEN LNT TTT LEIS ET 


9.5 DMA Coprocessor Internal Priority Schemes 


Within the DMA coprocessor, two priority schemes are used to designate 

which channel is serviced next: 

Ll afixed priority scheme with channel 0 always having the highest priority 
and channel 5 the lowest, 

1 arotating priority scheme which places the just-serviced channel atthe 
bottom of the priority list. 


Select the desired scheme by setting bit 30 (PRIORITY MODE) of DMA 
channel 0’s DMA channel control register (Figure 9-3 and Table 9-1 on 
page 9-8): 

Li PRIORITY MODE = 0 = rotating priority 

[1 PRIORITY MODE = 1 = fixed priority 


9.5.1 Fixed Priority Scheme 


This scheme provides a fixed priority (unchanging) for each channel as fol- 
lows: 
Highest priority 


0 
1 
2 
3 
4 
Lowest priority 5 


To set up this scheme, set the PRIORITY MODE bit (bit 30) of channel 0’s 
DMA channel control register to 1 (one). 


9.5.2 Rotating Priority Scheme 


In a rotating priority scheme, the last channel serviced becomes the lowest 
priority channel. The other channels sequentially rotate through the priority 
list with the next lowest channel from the just-serviced channel becoming 
the highest priority on the following request. The priority rotates every time 
the most-recent priority-granted channel completes its access. Figure 9-9 
and Figure 9—11 illustrate the rotation of priority across several DMA co— 
processor accesses. At system reset, the channels are ordered from high- 
est to lowest priority (0, 1, 2, 3, 4, 5). 


To set up this scheme, set the PRIORITY MODE bit (bit 30) of channel 0’s 
DMA control register to 0 (zero). 


The DMA coprocessor handles channel arbitration on an access-by-access 
basis; that is, a DMA channel must contend for both the read and the write 
access in unified mode. 
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Each service is one read access or write access. See 
Figure 9-10 for an example of a read/write sequence. 
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Aithe start of the example in Figure 9-9, channels 2, 4, and 5 are requesting 
service. Since channel 2 has the highest priority, it is serviced first. It then 
becomes the lowest priority channel. The highest priority channel then be- 
comes channel three. On the following services, channels 4 and 5 are taken 
care of in a similar fashion. Each service means one read access or one 
write access. Figure 9-10 shows the entire read and write sequence. 


Figure 9-10. | DMA Read and Write fe Sequence — 
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T DMAchannel requesting an access 
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Another way to visualize the rotation of priorities is shown in Figure 9-11. 
This example shows the same results as in Figure 9-9. It helps to make 
clear the rotating nature of the priority scheme. Priority decreases from high- 
est to lowest in a clockwise direction. The priority rotates in a counter clock- 
wise direction with the most recently serviced channel ending up in the low- 
est priority position. 
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Figure 9-11. 
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With the rotating priority scheme, any DMA channel requesting service is 
guaranteed to be recognized after a number of higher priority requests have 
been serviced. The maximum number of requests are: 

L) five in unified mode 

[I eleven in split mode 


This provides a fair means of preventing a channel from monopolizing the 
system. 


DMA channels that are running and are not synchronized via interrupts are 
always requesting service. 


9.5.3 Split Mode and DMA Channel Arbitration 
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When aDMAchannel is running in split mode, arbitration between channels 
is similar to that just discussed. The split-mode DMA channel has the same 
priority as the unified DMA channel. The only issue is how to arbitrate be- 
tween the primary split channel and the auxiliary split channel. Both split 
channels alternate priorities via a rotating priority scheme between each 
other. 


When a DMA channel is in split mode and both paths are simultaneously 
reset via the START and AUX START bits, the output (primary) channel has 
priority over the input (auxiliary) channel. Both the START and AUX START 
bits must be written at the same time in order to achieve this reset condition. 
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The priority scheme for channels is slightly different than the scheme for 

primary and auxiliary channels within a channel: 

LJ for channels, priority changes after a read or a write 

(1 forthe primary and auxiliary channels within a channel, priority changes 
after a complete read and write. 


Figure 9-12 is an example of two channels contending for the DMA bus: 
channel 2 (a split channel) and channel 4. In this case: | 

Li only channel 2 (i.e., not channel 4) is being run in split mode 

[i its primary channel is identified as 2pri 

LJ its auxiliary channel is identified as 2aux 

Li the paths requesting service are identified with a T 


In the example described below, channel 4 will do one complete transfer 
(read and write) for each complete transfer of either channel 2pri or 2aux. 


Figure 9-12. Example of a Channel Priority Scheme in Split Mode 


Highest priority channel 0 


Lowest priority channel 5 


T DMA channel requesting an access 

+ Split channels requesting access 

2pri = the primary split channel of channel 2 
2aux = the auxiliary split channel of channel 2 


The channel priority scheme in Figure 9-12 is further shown sequentially 


in Figure 9-13 (on the next page): Qo 
1) 


The first service is a request by the primary split channel of channel 2 
(2pri). 2pri reads, and then channel 2 is moved to the lowest priority 
level, but 2pri remains the higher priority channel of channel 2. 


2) On the second service, channel 4, now a higher priority than channel 
2, reads its source address. 


3) On the third service, the value read by 2pri is written to its destination 
address, and channel 2 is moved to the lowest priority level. Also, 2pri 
is moved to a lower priority than 2aux, channel 2’s auxiliary channel. 
Note that the split channel that just completed a read retains a higher 
priority than the other split channel until the data is written to the destina- 
tion address. 
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4) Onthe fourth service, the value read by channel 4 in service 2 is now 
written to its destination address and the channel becomes the lowest 
priority. 


hcube 9-13. Service Sequence for Split Mode ay mcs 
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5) In the fifth service, 2aux is read and channel 2 becomes the lowest 
priority. 

6) Onthe sixth service, channel 4s read again, and it becomes the lowest 
priority. 


7) On the seventh and eighth services, the 2aux and channel 4 values 
that were read in services 5 and 6 are now written to their destination 
addresses. After the channel is written, it assumes the lowest priority. 


8) Inthe ninth service, 2pri is read again as in the first service, and the 
read/write cycle continues as begun in the first service. 
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9.6 CPU and DMA Coprocessor Arbitration 


The DMA coprocessor has its own internal buses for transferring data. Only 
when aresource conflict exists between the DMA coprocessor and the CPU 
is it necessary for arbitration between these two. 


When the CPU and DMA coprocessor arbitrate for memory access, the 
memory address along with the channel’s DMA PRI bits (bits 0 and 1 of the 
channel control register) are used in this arbitration scheme. These bits are 
described in Table 9—7 below. Higher priority DMA channels will be serviced 
before lower priority DMA channels whose requested address does not con- 
flict with a CPU access or who have higher priority than the CPU. 


The DMA PRI bits of the channel control register (of the DMA channel arbi- 
trating with the CPU) define the arbitration rules. These rules apply when- 
ever the CPU and the highest priority requesting channel request the same 
resource. Otherwise, the CPU and DMA coprocessor access may proceed 
in parallel. 


All arbitration between the CPU and the DMA coprocessor is on an access 
basis; that is, the DMA coprocessor must contend for the read and the write 
accesses of a DMA transfer in unified mode and split mode. 


Table 9-7. | DMA PRI Bits and CPU/DMA Arbitration Rules 


DMA PRI 
(Bits 1-0) 
00, DMA access is lower priority than the CPU access. If the DMA channel 


and the CPU are requesting the same resource, then the CPU will pro- 


ceed. (DMA PRI bits are set to 00> at reset.) 
O15 
1 05 Reserved 
11 


If the DMA channel and the CPU are requesting the same resource, 
a | DMA access is higher priority than the CPU access. If the DMA chan- 


source, the DMA coprocessor will proceed. This priority rule provides 
a fair arbitration scheme by alternating CPU accesses with a DMA 
channel's access. 


then the CPU will proceed. Then, after the CPU access is complete, 
if the DMA coprocessor and CPU are again requesting the same re- 
nel and the CPU are requesting the same resource, the DMA will pro- 
ceed. 
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9.7 Data Transfer Modes 


Each DMA channel can operate in four types of data transfer modes. These 
modes differ on: 

Ci whether or not they use autoinitialization 

[3 how they operate if autoinitialization is in effect or not. 


Table 9-8 and the following paragraphs describe these data transfers. 
Table 9-8. | TRANSFER MODE Field Description Summary 


TRANSFER 


ce Transfer Mode Summary 


and 5—4) 


Transfers are not terminated by the transfer counter. No 
autoinitialization is performed. The TCINT (transfer count inter- 
rupt) bits can still be used to cause an interrupt when the transfer 
counter makes a transition to zero. The DMA channel continues to 


Transfers are terminated by the transfer counter.No autoinitializa- 
tion is performed. A half code of 102 is placed in the START field 
(bits 22-23 and bits 24—25 of the DMA channel control register 
when transfers are (Rll eae 


| The DMA channel is autoinitialized when the CPU restarts the 
DMA coprocessor by using the DMA channel control register in 
the CPU. When the transfer counter goes to zero, operation is 
halted until the CPU starts the DMA coprocessor by using the 
START field in the DMA channel control register. A halt code of 
105 is placed in the START field by the DMA. 


9.7.1 Running Under TRANSFER MODE = 00, 


When TRANSFER MODE = 005, transfers are not terminated when the 
transfer counter goes to zero, and no autoinitialization is performed. Even 
though the transfer counter does not halt transfers, an interrupt can be gen- 
erated on the transfer counter transition to zero, causing TCINT FLAG bit 
=1. Ifthe DMAcoprocessor channel is not halted after the transfer reaches 
zero, the counter will continue decrementing below zero. 


9.7.2 Running Under TRANSFER MODE = 01> 


When TRANSFER MODE = 01o, transfers are terminated when the transfer 
counter goes to zero, and no autoinitialization is performed. When the 
transfer counter goes to zero, the DMA channel is halted by forcing 10> into 
the START or AUX START field. 
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9.7.3. Running Under TRANSFER MODE = 105 


This transfer mode allows the DMA channel to take care of itself. It can run 
continuously, change pointers and synchronization by the autoinitialization 
procedure, and turn itself off. 


This mode always starts with the DMA channel reset (START or AUX 
START a 005) or halted (these bits 015 or 105) and the transfer counter at 
0. This occurs after a system reset, after the DMA channel is software reset 
(005 written to the START or AUX START bits), or after the channel is halted 
(015 or 105 written to these bits). The process for setting up and running 
a DMA channel under transfer mode 105 is summarized in Figure 9-14. 


1) After placing the DMA channel in the reset or halted state and the trans- 
fer counter at 0, initialize the channel for the desired operation. In this 
case, set the transfer mode bits to 109. Since the DMA channel autoin- 
itializes itself when started under this mode, the CPU needs only to ini- 
tialize the DMA channel control register and the DMA channellink point- 
er. The other DMA channel registers are automatically set up by the au- 
toinitialization process. Synchronization of reads and writes is allowed. 


2) After initializing the DMA channel, the channel can be started by writing 
115 to the START or AUX START bits. 

3) After this, the DMA. channel will perform the sequence: autoinitialize 
and do a block transfer. 


, 
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9.7.4 Running Under TRANSFER MODE = 11, 


This transfer mode, besides having all of the advantages of autoinitializa- 
tion, allows the CPU to very easily coordinate its operation with the opera- 
tion of the DMA channels. 


Like transfer mode (see subsection 9.7.3), this mode always starts out with 
the DMA channel reset or halted and the transfer count = 0. This occurs after 
a system reset, after the DMA channel is reset by writing 002 to the START 
or AUX START bits in the DMA channel control register, or after the channel 
is halted by writing 019 or 10. to these bits. The process for setting up and 
running a DMA channel under transfer mode 11. is summarized in 
Figure 9-15. 


1) 


After placing the DMA channel in the halted or reset state and the trans- 
fer counter = 0, initialize the channel for the desired mode of operation. 
In this case, set the TRANSFER MODE bits to 115. Since the DMA 
channel autoinitializes itself when started under this mode, the CPU 
needs to initialize only the DMA channel control register and the DMA 
channellink pointers. The other DMA channel registers are set up by the 


-autoinitialization procedure. 


After initializing the DMA channel, the channel can be started by writing 
115 to the START bits. 

Then, the DMAchannel autoinitializes itself and does a block transfer. 
When the transfer counter goes to zero, wait for the CPU to write a 112 
to the START field of the DMA channel control register. 

Then repeat the sequence autoinitialize, transfer, and wait. 

When the transfer count goes to zero, the DMA channel can be halted 
by forcing 10. into the START or AUX START field. 


Figure 9-15. none a DMA Channel Under Transter Mode y M2 
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9.8 Autoinitialization 


When the DMA channelis operating in autoinitialization mode, the link point- 
er register and auxiliary link pointer register are used to initialize the regis- 
ters that control the operation of the DMA channel. These pointers are me- 
mory-address locations for blocks of data that are to be loaded into the DMA 
register file, as shown in Figure 9-1 and Figure 9-2, beginning on page 
9-4, 


How this autoinitialization is done depends upon the current mode of opera- 
tion of the DMA channel and the mode to which it is being autoinitialized. 
In all cases, either the link pointer or auxiliary link pointer (used in DMA split 
channel mode) contains the address of a block of memory that contains the 
new DMA channel register values (registers shown in Figure 9—1 on page 
9-4). 


During autoinitialization, the link pointer may be incremented (AUTO INIT 
STATIC = 0) or held constant (AUTO INIT STATIC = 1).. (This is bit 8 or 9 
of the channel control register, Figure 9-3 on page 9-8.) 


[}_ When the link pointer is incremented, the autoinitialization values are 
stored in sequential memory locations, and the link pointer or auxiliary 
link pointer is incremented in order to access each of these locations. 


[3 Holding the linking pointer constant is very useful when autoinitializing 
the DMA channel from a stream-oriented device such as the on-chip 
communication ports or external FIFOs. | 


The SPLIT MODE bit (bit 14 in Figure 9-3 on page 9-8) defines the mode 
under which the DMA channel is currently running. When autoinitializing the 
DMA coprocessor, do not change the SPLIT MODE bit. This bit should be 
changed only when the DMA coprocessor has been reset and halted (see 
DMA START bit description, Table 9-5 on page 9-15). 


Autoinitialization is a DMA operation to the DMA coprocessor’s registers; 
i.e., it reads the value pointed to by the link pointer and writes the value to 
the DMA register over the peripheral bus on the next available cycle. 


Autoinitialization mmmunsnnussena 


lf the DMA channel is performing memory-to-memory transfers 
(SPLIT MODE = 0), the link pointer is used. The DMA channel registers are 
loaded in the following order: 

1) DMA channel control register 

2) Source address register 

3) Source address index register 

4) Transfer count register 

5) Destination address register 

6) Destination address index register 

7) Link pointer register 


The storage of new values for these registers in memory is illustrated in 
Figure 9-16. 


Figure 9-16. Store New Values of DMA Channel Registers in Memory (SPLIT MODE = 0) 


9-32 


ba of New a Values in in Memory 
Link Pointer (+0) > ; sg ee 
+1 
+2 
+3 
+4 
+5 


+6 


If the DMA channel is running in split mode (SPLIT MODE = 1), then the 
autoinitialize sequence depends upon which counter has terminated. 


If the transfer-count register has gone to zero with SPLIT MODE=1, 
then the link-pointer register is used for autoinitialization. In this case, the 
DMA channel registers are loaded in the following order: 3 
1) DMA channel control register 

2) Source address register 

3) Source address index register 

4) Transfer count register 

5) Link pointer register 


The storage of the new values for these registers in memory is illustrated 
in Figure 9-17. 
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Figure 9-17. | Store New Values of DMA Channel Registers in Memory (SPLIT MODE = 1) 
Map of New Register Values in Memory 


Link Pointer (+0) —> 
+1 
+2 z 
+4 ei 


If the auxiliary counter has gone to zero with SPLIT MODE=1, then the 
auxiliary link pointer register is used for autoinitialization. In this case; the 
DMA channel registers are loaded in the following order: 

1) DMA channel control register | 

2) Destination address register 

3). Destination address index register 

4) Auxiliary transfer count register 

5) Auxiliary link pointer register 


The storage of the new values of these registers in memory is illustrated in 
Figure 9—18. 


Figure 9-18. | Store New Values of DMA Channel Registers in Memory (SPLIT MODE = 1 and Auxiliary 
Transfer Counter = 0) 


dial of New Bedi Values | in memely 
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Usually, autoinitialization data will be stored in memory. In this case, syn- 
chronization for autoinitialization is not generally necessary. To disable the 
synchronization of data reads during autoinitialization, set AUTOINIT 
SYNCH (bits 10 and 11, DMA channel control register) to 0. In some cases, 
you may wish to transfer autoinitialization data in the same way as the syn- 
chronized data reads and writes. To synchronize autoinitialization based 
upon the interrupt identified with the READ SYNCH and WRITE SYNC 
fields (DIE register, page 3-8), set both the AUTOINIT SYNCH and AUX 
AUTOINIT SYNCH (bits 10—11 of DMA channel control register) to 1. In this 
way, autoinitialization will be synchronized only with the SYNCH MODE in 
effect. 


The data reads for autoinitialization are arbitrated for by the DMA channels 
just like a typical DMA access. The only difference is that their synchroniza- 
tion is controlled by AUTOINIT SYNCH. A summary of the autoinitialization 
effect of the SYNC MODE and AUTOINIT SYNC bits is listed in Table 9-9 
on page 9-37. This table pertains to autoinitialization only. 


In unified mode, all of the writable control register bits are affected by 
autoinitialization. These bits are labeled in Figure 9—19. 


In split mode during autoinitialization of the primary DMA channel, the 


writable, nonauxiliary bits may be modified, but auxiliary bits are protected 


(these bits are in Figure 9—20). In other words, only nonauxiliary bits are al- 
lowed to be modified by the CPU or DMA coprocessor. Also, if the auxiliary 
DMA channel is autoinitialized, the writable auxiliary bits may be modified, 
but nonauxiliary bits are protected. These bits are labeled in Figure 9-21. 


In all autoinitialization modes, the shadowed bits (Figure 9-19) that are 


writable (W-designated bits in Figure 9-3) do not have an affect until 


autoinitialization is complete. Unshadowed bits affect the autoinitialization 
sequence. In other words, at autoinitialization, shadowed bit values will be 
entered last after all registers are loaded (as specified.by the link pointer). 


Regardless of whether the DMA channel is running in unified mode or split 
mode, writes by the CPU or another external source to the DMA channel 
control register affect all writable bits. 
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Figure 9-19. | DMA Channel Control Register Bits That Can Be Modified by Autoinitialization Under 
Unified Mode 


28 27 26 25 19 18 


17 16 15 


COMPORTS | Gea. 


s— These shadowed bits do not take affect until autoinitialization is complete. 


Figure 9-20. DMA Channel Control Register Bits That Can Be Modified by Autoinitialization of the 
Primary Channel Under Split Mode 
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s — These shadowed bits do not take affect until autoinitialization is complete. 
xx — Write protected during primary channel autoinitialization. 
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Figure 9-21. | DMA Channel Control Register Bits That Can Be Modified by Autoinitialization of the 
Auxiliary Channel Under Split Mode 
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s — These shadowed bits do not take affect until autoinitialization is complete. 
xx — Write protected during primary channel autoinitialization. 
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Table 9-9. 


Autoinitialization 


Autoinitialization synchronization is a function of 

L} the SYNC MODE bits (DMA channel control register bits 6 and 7) that 
control synchronization of data transfers, and 

CL} the AUTOINIT SYNC bits (DMA channel control register bits 10 and 11) 
that affect only autoinitialization synchronization. 


If the SYNC MODE bits are not set to synchronize data transfers (i.e., if the 
preceding data transfer is not synchronized on interrupts), then the DMA 
channel autoinitialization sequence will not be synchronized either. If the 
SYNCH MODE bits are set to transfer data synchronously (i.e., if the pre- 
ceding data transfer is synchronized), then the upcoming data channel au- 
toinitialization sequence may be synchronized on either reads or writes or 
both (depending on whether the DMA coprocessor is in unified or split 
mode) as shown in Table 9-9. Note that for all combinations of the SYNCH 
MODE and AUTOINIT SYNC bits not shaded in the table, the DMA channel 
autoinitialization sequence is not synchronized on interrupts. 


Effect of SYNC MODE and AUTOINIT MODE bits in Autoinitialization 


These Bits of the DMA Channel Cause Autoinitialization 
Control Register Synchronization To Occur On 


SYNC MODE | AUTOINIT SYNC 
Bit Nos: Bit Nos : : 
2 11—10 Unified Mode Split Mode 


[oo | 0 0 | nose | nosne 
[oo | 0 1 | nosyne | nosyno 
oo | 1 + | nosyne | nosyno 
To + | 0 0 | nosyno 


f 


pt o | 0 0 | nosyne no sync 
| 1 0 | o 14 | nosyne no sync 
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9.8.1 Fun With Link Pointers 


For many applications, it is sufficient to autoinitialize the DMA channel with 
the same data each time. In this case, the new link-pointer value points to 
the start of the same block of data containing the new link pointer as illus- 
trated in Figure 9-22. This particular example assumes a DMA channel that 


is not running in split mode. 


If you want, you can get fancier. The new link pointer may point to a new set 
of register values as illustrated in Figure 9-23. This may be continued to any 


level you like. Have fun! 


Figure 9-22.  Self-Referential Link Pointer 
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9.9 DMA Coprocessor and Interrupts 


All of the interrupts that the DMA coprocessor can see are first received by 
the CPU interrupt controller. If these interrupts are edge triggered, they are 
latched by the CPU in the appropriate interrupt-flag register. The edge-trig- 
gered interrupts are timer interrupts, DMA interrupts, and external inter- 
rupts that are configured as edge-triggered interrupts. Detailed information 
on interrupts is provided in Section 6.7 on page 6-23. 


For edge-triggered interrupts, when the interrupt controller determines 
that the interrupt a DMA channel is waiting on has been latched into the in- 
terrupt flag, the CPU clears the interrupt flag and sends an interrupt pulse 
to the DMA channel. The DMA channel latches the interrupt locally until it 
can service the interrupt. At that time, the latched interrupt is cleared by the 
DMA coprocessor for two cycles. 


Level-triggered interrupts that are generated by the communication ports 
and external interrupts that are configured as level-triggered interrupts are 
handled differently by the CPU interrupt controller. For level-triggered inter- 
rupts when the interrupt controller determines that the interrupt a DMA 
channel is waiting on has been received (recall that level-triggered inter- 
rupts are not latched by the CPU interrupt-controller), the CPU sends an in- 
terrupt pulse to the DMA channel. The DMA channel latches the interrupt 
locally until it can service the interrupt. At that time, the locally latched inter- 
rupt is cleared by the DMA coprocessor for two cycles. 


The interrupt reset signal generated by the DMA coprocessor after a DMA 
interrupt is serviced has a higher priority over the interrupt set signal. Thus, 
the interrupt signal won’t be continuously set even if the CPU is continuously 
sending the interrupt set signal. Hence, when the DMA-set priority scheme 
is used and a higher priority DMA channel is driven by continuous interrupt 
signals, the lower priority DMA channel can be serviced in between the high- 
er priority DMA services. 


The internal circuitry of the TMS320C40 guarantees proper operation be- 
tween a communication port that generates level-triggered interrupts and 
the DMA channel that is synchronizing with those level-triggered interrupts. 
However, when you synchronize the DMA channels with external interrupts, 
it is better that these interrupt lines be configured as edge-triggered inter- 
rupts to ensure that only one interrupt is recognized. _ 


Subsection 9.9.1 describes using interrupts to synchronize the DMA 
coprocessor. The interrupt mode for each channel is first selected in the 
DMA interrupt enable register, described with the CPU registers in subsec- 
tion 3.1.8 on page 3-8. 
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9.9.1. Interrupts and Synchronization of DMA Channels 


Figure 9-24. 


DMA channel transfers may be synchronized through the use of interrupts. 
The interrupt used is first selected by using the DMA interrupt enable regis- 
ter (subsection 3.1.8 on page 3-8). 


Table 9—5 (page 9-15) describes the relationship between the SYNC MODE 
bits of the DMA channel control register and the four synchronization mech- 
anisms performed: 

[i No synchronization (SYNC MODE = 0 09) 

[1 Source synchronization (SYNC MODE = 0 19) 

Li Destination synchronization (SYNC MODE = 1 09) 

[} Source and destination synchronization (SYNC MODE = 1 19) 


If the DMA split mode is selected, bits 6 and 7 of the DMA channel control 
register (page 9-15) are used to control channel synchronization: 

[} bit 6 controls primary channel synchronization 

(1 bit 7 controls auxiliary channel synchronization 


No Synchronization (SYNC MODE = 0 05) 


When SYNC MODE =0 09, no synchronization is performed. The DMA per- 
forms reads and writes whenever it has the priority to use the DMA bus. 
All interrupts are ignored and, therefore, are considered to be globally 
disabled. However, no bits in the DMA interrupt enable register are 
changed. Figure 9-24 shows the synchronization mechanism when SYNC 
MODE = 0 05. 


lf an external interrupt is used for DMA interrupt synchronization, the exter- 
nal pin must be configured as a DMA interrupt pin (the DMA interrupt enable 
register is explained in subsection 3.1.8 on page 3-8 ). 


No DMA Synchronization 


Disable DMA interrupts globally 


DMA channel performs a read 


DMA channel performs a write 
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Source Synchronization (SYNC MODE = 0 159) 


When SYNC MODE = 0 19, the DMA coprocessor is synchronized to the 
source (see Figure 9-25). A read will not be performed until an interrupt is 
received by the DMA coprocessor (this also applies to the DMA primary 
channel in split mode as shown in Figure 9-25). Then, all DMA interrupts 
are disabled globally. However, no bits in the DMA interrupt enable register 
are changed. 


Figure 9-25. DMA Source Synchronization 


(a) DMA channel in unified mode (b) Primary channel in split mode 
[Start | 


Idle until enabled interrupt is received 


| Disable DMA interrupts globally 
Disable DMA interrupts globally 


DMA channel performs a read 
DMA channel performs a read 


Write data to communication port 


output FIFO 
DMA channel performs a write 
Enable DMA interrupts globally | 


Idle until enabled interrupt is received 


9 Enable DMA interrupts globally 


| 
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Destination Synchronization (SYNC MODE = 1 09) 


When SYNC MODE = 1 09, the DMA coprocessor is synchronized to the 
destination in unified mode. First, all interrupts are ignored until the read is 
complete. (Though the DMA interrupts are considered to be globally dis- 
abled, no bits inthe DMA interrupt enable register are changed.) A write will 
not be performed until an interrupt is received by the DMA coprocessor. 
Figure 9-26 shows the synchronization mechanism when SYNC MODE = 
1 Oo in unified mode. 


For the auxiliary channel in split mode, synchronization is similar to primary 
channel synchronization. The exception is that for the primary channel, the 
data is read from memory and written to a communication port. output FIFO 
(shown on the right side of Figure 9-26). The auxiliary channel can read 
from a communication channel and write data to a memory address. 


Figure 9-26. DMA Destination Synchronization 


DMA interrupts are disabled globally 


DMA interrupts are enabled globally 


Idle until enabled interrupt is received 


_ Disable DMA interrupts globally 


- DMA channel performs a write 


(a) Unified mode (b) Auxiliary channel in split mode 


| 


Idle until enabled interrupt is received 


Disable DMA interrupts globally 


! , 


Read data from 
communication port FIFO 


DMA channel performs a write 


DMA channel performs a read 


Enable DMA interrupts globally 
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Source and Destination Synchronization (SYNC MODE = 119) 


When SYNC MODE = 1 19, aread is performed when a read interrupt is re- 
ceived, and a write is performed on the write interrupt. If a write interrupt is 
received before a read interrupt, the write interrupt is latched and the DMA 
data write won't be executed until the read is completed. If DMA split mode 
is selected, it reacts as two independent synchronizations for the primary 
and auxiliary channels. Source and destination synchronization when 
SYNC MODE = 1 19 is shown in Figure 9-27. 


Figure 9-27. DMA Source and Destination Synchronization 


Idle until enabled interrupt is received 


Disable DMA interrupts globally 


DMA channel performs a read 


Enable DMA interrupts globally 


Disable DMA interrupts globally 


Idle until enabled interrupt is received — 
DMA channel performs a write 


Enable DMA interrupts globally 
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9.10 TMS320C40 Timers 


The TMS320C40 timer modules are general-purpose, 32-bit, timer/event 
counters, with two signaling modes and internal or external clocking (see 
Figure 9-28). The timer modules can be used to signal to the TMS320C40 
or to the external world at specified intervals, or to count external events. 
With an internal clock, the timer can be used to signal an external A/D con- 
verter to start aconversion, or it can interrupt the TMS320C40 DMA control- 
ler to begin a data transfer. With an external clock, the timer can count ex- 
ternal events and interrupt the CPU after a specified number of events. 
Available to each timer is an I/O pin that can be used as an input clock to 
the timer, an output clock signal, or a general-purpose |/O pin. 


Internal Clock/2 
External Clock 
a Miers 
Period Register (31-0) 


32 


Figure 9-28. Timer Block Diagram 


32 


Comparator 
Period = Counter ? 
Pulse Generator 
~ TSTAT 


¥ 


Timer Out 


Three memory-mapped registers are used by each timer: 
Li Global-control register 

[1 Period register 

[1 Counter register 
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The timer global-control register determines the operating mode of the 
timer, monitors the timer status, and controls the function of the I/O pin of 
the timer. The period register specifies the timer’s signaling frequency. The 
timer counter register contains the current value of the incrementing 
counter. The timer can be incremented on the rising edge or the falling edge 
of the input clock. The counter is zeroed whenever its value equals that in 
the period register. The pulse generator generates two types of external 
clock signals: pulse or clock. The memory map for the timer modules is 


Figure 9-29. 
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shown in Figure 9-29. | 


Memory-Mapped Timer Locations 


Register Peripheral Address 

Timer 0 Timer 1 
- 808020h 808030h 
08022h 808032h 
808023h B08033h 
808024h 808034h 
808025 808035 
[Reserved | 0028 B08036h 
808027h 808037h 
s080z8h g08038h 
0802h 808039h 
g0802An 80803 
g0so28h 808038 
g0802ch —80803Ch 
[Reserved | 80802Dh 0803Dh 
80802Eh 80803 

80802Fh 80803Fh 
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9.10.1 Timer Global-Control Register 


The timer global control register is a 32-bit register that contains the global 
and port control bits for the timer module. Table 9—10 defines the register 
bits, names, and functions. Bits 3-0 are the port control bits; bits 
11 — 6 are the timer global control bits. Figure 9-30 shows the 32-bit regis- 
ter. Note that at reset, all bits are set to 0 except for DATIN (set to the value 
read on TCLK). 


Figure 9-30. Timer Global-Control Register 
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NOTE: xx = reserved bit, read as 0. 
R = read, W = write. 
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Table 9-10. Timer Global-Control Register Bits Summary 


FUNC controls the function of TCLK. If FUNC = 0, TCLK is configured as a general- 
FUNC | purpose digital I/O port. If FUNC = 1, TCLK is configured as a timer pin (see 
Figure 9-33 for a description of the relationship between FUNC and CLKSRC). 
- If FUNC = 0 and CLKSRC = 0, TCLK is configured as a general-purpose |/O pin. In 
1 /O this case, if |/O =0, TCLK is configured as a general-purpose input pin. If 1/O =1, TCLK 
is configured as a general-purpose output pin. 
2 DATOUT DATOUT drives TCLK when the TMS320C40 is in I/O port mode. DATOUT can also 
be used as an input to the timer. 
DATIN | Data input on TCLK or DATOUT. A write has no effect. | 
The GO bit resets and starts the timer counter. When GO = 1 and the timer is not held, 
GO the counter is zeroed and begins incrementing on the next rising edge of the timer input 
clock. The GO bitis cleared on the same rising edge. GO = Ohas no effect on the timer. 
Table 9—11 further defines these bits. | 
| Counter hold signal. When this bit is zero, the counter is disabled and heldinits current 
state. If the timer is driving TCLK, the state of TCLK is also held. The internal divide-by- 
7 HLD two counter is also held so that the counter can continue where it left off when HLD 
is setto 1. The timer registers can be read and modified while the timer is being held. 
| RESET has priority over HLD. Table 9—11 shows the effect of writing to GO and HLD. 
Clock/pulse mode control. When C/P = 1, clock mode is chosen, and the signaling of 
CP the status flag and external output will have a 50 percent duty cycle. When C/P = 0, 


the status flag and external output will be active for one H1 cycle during each timer 
period (see Figure 9-31). 


CLKSRC 


Specifies the source of the timer clock. When CLKSRC = 1, an internal clock with fre- 
quency equal to one-half the H1 frequency is used to increment the counter. The INV | 
bit has no effect on the internal clock source. When CLKSRC = 0, an external signal 
fromthe TCLK pin can be used to increment the counter. The external clock is synchro- 
nized internally, thus allowing external asynchronous clock sources that do not exceed 
the specified maximum allowable external clock frequency. This will be less than 
f(H1)/2. (See Figure 9-33 for a description of the relationship between FUNC and 
CLKSRC). 


Inverter control bit. If an external clock source is used and INV = 1, the external clock 
is inverted as it goes into the counter. If the output of the pulse generator is routed to 
TCLK and INV = 1, the output is inverted before it goes to TCLK (see Figure 9-28.). 
If INV = 0, no inversion is performed on the input or output of the timer. The INV bit 
has no effect, regardless of its value, when TCLK is used in I/O port mode. 


This bit indicates the status of the timer. It tracks the output of the uninverted TCLK 


TSTAT pin. This flag sets a CPU interrupt on a transition from 0 to 1. A write has no effect. 


12-31 _| Reserved 


11 
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Table 9-11 shows the result of a write using specified values of the GO and 
HLD bits in the timer global control register. 


Table 9-11. Result of a Write of Specified Values of GO and HLD 


GO HLD 
rere fen | Pm 


fo | oOo | All timer operations are held. No reset is performed. 


Timer proceeds from state before write 
Le | All timer operations are held, including zeroing of the counter. The GO 


bit is not cleared until the timer is taken out of hold. 


Timer resets and starts. 


1 
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9.10.2 Timer Period and Counter Registers 


The 32-bit timer period register is used to specify the frequency of the timer . 
signaling. The timer counter register is a 32-bit register that is reset to zero 
whenever it increments to the value of the period register. Both registers are 
set to 0 at reset. The locations of the registers are shown in Figure 9-29 
on page 9-46. 


Certain boundary conditions affect timer operation, such as a zero in the 
period register and an overflow of the counter. These conditions are listed 
as follows: 


[J When the period and counter registers are zero, the operation of the 
timer is dependent upon the C/P mode selected. In pulse mode 
(C/P = 0), TSTAT is set and remains set. In clock mode (C/P = 1), the 
width of the cycle is 2/f(H1), and the external clocks are ignored. 


[} Whenthe counter register is not 0 and the period register = 0, the count- 
er will count, roll over to 0, and then behave as described immediately 
above (for both period and counter registers being zero). 


[i When the counter register is set to a value greater than the period 
register, the counter may overflow when being incremented. Once the 
counter reaches its maximum 32-bit value (OFFFF FFFFh), it simply 
Clocks over to 0 and continues. 


Writes from the peripheral bus override register updates from the counter 
and new status updates to the control register. 


9.10.3 Timer Pulse Generation 
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The timer pulse generator (See Figure 9-28) can generate several different 
external signals. These signals may be inverted with the INV bit. The two 
basic modes are pulse mode and clock mode, as shown in Figure 9-31. In 
both modes, an internal clock source has a frequency of f(H1)/2, and an 
external clock source has a maximum frequency of less than f(H1)/2. Refer 
to timer timing in Chapter 14. In pulse mode (C/P = 0), the width of the pulse 
is 1/f(H1). 
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Figure 9-31. Timer Timing 


jr 2/f(H1) 
| + — 1/4(H1) 
| | 
Lv 1A(CLKSRC) | 


————— period register/f(CLKSRC) 


(a) TSTAT and Timer Output (INV = 0) When C/P = 0 (Pulse Mode) 


j$—_—_—_————tet— 1/{(CLKSRC) 


g—_——te|—1— 2/f(H1 
| | | on 
L$» period register/f(CLKSRC) | 


~_—_—_—_—_——_———_—— 2 x period register/f(CLKSRC) ——-_>| 


(b) TSTAT and Timer Output (INV = 0) When C/P = 1 (Clock Mode) 


The rate of timer signaling is determined by the frequency of the timer input 9 
clock and the period register. The following equations are valid with either 
an internal or an external timer clock: 


f(pulse mode) = f(timer clock) / period register 
f(clock mode) = f(timer clock) / (2 x period register) 
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9.10.4 Timer Operation Modes 


(1 The timer can receive its input and send its output in several different 
modes, depending upon the setting of CLKSRC, FUNC, and I/O. The 
four timer modes of operation are defined as follows: | 


CL} If CLKSRC = 1 and FUNC = 0, the timer input comes from the internal 
clock. The internal clock is not affected by the INV bit (bit 10 as shown 
in Figure 9-30 on page 9-47). In this mode, TCLK is connected to the 
I/O port control and can be used as a general-purpose 1/O pin (see 
Figure 9-32). If /O =0, TCLK is configured as a general-purpose input 
pin whose state can be read in DATIN. DATOUT has no effect on TCLK 
or DATIN. If /O = 1, TCLK is configured as a general-purpose output 
pin. DATOUT is placed on TCLK and can be read in DATIN. 


Figure 9-32. Timer I/O Port Configurations 


Internal 


DATOUT (NC) ————-o TCLK 


DATIN 
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Internal 


DATOUT , TCLK 
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[1 If CLKSRC =1 and FUNC = 1, the timer input comes from the internal 
clock, and the timer output goes to TCLK. This value may be inverted 
by using INV, and the value output on TCLK can be read in DATIN. 


LI IifCLKSRC =0 and FUNC =0, the timer is driven according to the status 
of the I/O bit. If I/O = 0, the timer input comes from TCLK. This value can 
be inverted by using INV, and the value of TCLK can be read in DATIN. 
If /O = 1, TCLKis an output pin. Then, TCLK and the timer are both driv- 
en by DATOUT. All 0-to-1 transitions of DATOUT increment the counter. 
INV has no effect on DATOUT. The value of DATOUT can be read in 
DATIN. | 


CL} IfCLKSRC =0and FUNC = 1, TCLKdrives the timer. If INV =0, all 0-to-1 
transitions of TCLK increment the counter. If INV = 1, all 1-to-0 transi- 
tions of TCLK increment the counter. The value of TCLK can be read 
in DATIN. 


Figure 9-33 shows the four timer modes of operation. 


Figure 9-33. Timer Modes as Defined by CLKSRC and FUNC 
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Timer moet Timer | 
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TSTAT DATIN 


CLKSRC = 0 (External) | 
FUNC = 1 (Timer Pin) 


(d) 
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Two characteristics of the TMS320C40 that contribute to its high perform- 
ance are pipelining and concurrent I/O and CPU operation. 


Five functional units control TMS320C40 operation: fetch, decode, read, ex- 
ecute, and DMA. Pipelining is the overlapping or parallel operations of the 
fetch, decode, read, and execute levels of a basic instruction. 


By performing input/output operations, the DMA coprocessor reduces the 
need for the CPU to do so, thereby decreasing pipeline interference and en- 
hancing the CPU’s computational throughput. 


Major topics discussed in this chapter are as follows: 


Section Page 
10.1 Pipeline Structure ........ 0... . ccc cece cece ee eens 10-2 
10:2 Pipeline CONnmnicts: 2ai0se0 tenants senate kwaued ses 10-4 

M Branch Conflicts ........... cc cece cece eee ee 10-4 

mM Register Conflicts .......... 0... cece eee eee +++ 10-8 

M Memory Conflicts ......... 0... cece eee e eee eens 10-11 
10.3 Resolving Memory Conflicts ........... 0.0... cece ees 10-18 
10.4 Clocking of Memory Accesses ............. ccc cece eens 10-20 


M Program Fetches .............. cece cece eee eee 
™@  # Data Loads and Stores 
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10.1 Pipeline Structure 
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The five major units of the TMS320C40 pipeline structure and their functions 
are as follows: 


Fetch Unit (F) Fetches the instruction words from 
memory and updates the program counter 
(PC). 

Decode Unit (D) Decodes the instruction word and performs 


address generation. Also controls any 
modification of the auxiliary registers and 
the stack pointer. 


Read Unit (R) If required, reads the operands from 
memory. 
Execute Unit (E) If required, reads the operands from the | 


register file, performs the necessary opera- 
tion, and writes results to the register file. If 
required, results of previous operations are 
written to memory. | 


DMA Coprocessor (DMA) Reads and writes memory. 


A basic instruction has four levels: fetch, decode, read, and execute. 
Figure 10-1 illustrates these four levels of the pipeline structure. The levels 
are indexed according to instruction and execution cycle. The perfect over- 
lap in the pipeline, where all four units operate in parallel, occurs at cycle 
(m). Those levels about to be executed are at m +1, and those just executed 
are at m—1. The TMS320C40 pipeline control allows a high-speed execu- 
tion rate of one execution per cycle. It also manages pipeline conflicts so that 
they are transparent to the user. You do not need to take any special precau- 
tions to guarantee correct operation. - 


Pipeline Operation 
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Figure 10-1. 


CYCLE 
m—3 


m—2 


m+3 


Notes: 1) W, X, Y, and Z represent instructions. 
2) F, D, R, E = fetch, decode, read, and execute, respectively. 


7™MS320C40 Pipeline Structure 


<— Perfect overlap 


Priorities from highest to lowest have been assigned to each of the function- 


al units as follows: 


Execute 
Read 
Decode 
Fetch 


Ooododd do Oo 


DMA (if configured as highest priority) 


DMA (if configured as lowest priority). 


When the processing of an instruction is ready to pass to the next higher 
pipeline level, but that level is not ready to accept a new input, a pipeline con- 
flict occurs. In this case, the lower priority unit waits until the higher priority 


unit completes its currently executing function. 
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10.2 Pipeline Conflicts 


The pipeline conflicts of the TMS320C40 can be grouped into the following 
main categories: 


Branch Conflicts Involve most of those instructions or operations 
that read and/or modify the PC. 

Register Conflicts Involve delays that can occur when reading from 
or writing to registers that are used for address 
generation. 


Memory Conflicts Occur when the internal units of the 
TMS320C40 compete for memory resources. 


Each of these three types is discussed in the following sections. Examples 
are included. Note in these examples, when data is refetched or an opera- 
tion is repeated, the symbol representing the stage of the pipeline is ap- 
pended with a number. For example, if a fetch is performed again, the initial 
fetch is labeled F1 and the refetch is labeled F2. When an access is detained 
multiple cycles because of a not ready, the symbols RDY and RDY are used 
to indicate not ready and ready, respectively. 


10.2.1 Branch Conflicts 
10.2.1.1 Standard Branches 


The first class of pipeline conflicts occurs with standard (nondelayed) 
branches, i.e., BR, Bcond, DBcond, CALL, IDLE, RPTB, RPTS, RETIcond, 
RETScond, interrupts, and reset. Conflicts arise with these instructions and 
operations because during their execution, the pipeline is used only for the 
completion of the operation; other information fetched into the pipeline is 
discarded or refetched, or the pipeline is inactive. This is referred to as flush- 
ing the pipeline. Flushing the pipeline is necessary in these cases to guaran- 
tee that portions of succeeding instructions do not inadvertently get partially 
executed. TRAPcond and CALLcond are classified differently from the oth- 
er types of branches and are considered later. 


Example 10-1 shows the code and pipeline operation for a standard 
branch. Note that one dummy fetch is performed (F1), and then after the 
branch address is available, a new fetch (F2) is performed. This dummy 
fetch affects the cache. 
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Example 10-1. Standard Branch 


BR THREE ; Unconditional branch 
MP YF ; Not executed 
ADD ; Not executed 
SUBF ; Not executed 
AND ; Not executed 
THREE OR ; Fetched after BR is fetched 
STI 
PIPELINE OPERATION 

PC | F | pb | R [| E | 

n BR - — - 

n+1 MPYF BR ~ — 

Fetch held for 
n+1 (nop) nar new PC velue 
n+1 Op). ABOpr. «tmepy ee THREE — PC 
THREE OR (nop) (nop) (nop) 

STI OR (nop) (nop) 


RPTS and RPTB both flush the pipeline, allowing the RS, RE, and RC regis- 
ters to be loaded at the proper time relative to the flow of the pipeline. Ifthese | 
registers are loaded without the use of RPTS or RPTB, no flushing of the 
pipeline occurs. If none of the repeat modes are being used, RS, RE, and 
RC may be used as general-purpose 32-bit registers without any pipeline 
conflicts occurring. In cases such as the nesting of RPTB due to nested in- 
terrupts, it may be necessary to load and store these registers directly while 
using the repeat modes. Since up to four instructions can be fetched before 
entering the repeat mode, loads should be followed by a branch to flush the 
pipeline. If the RC is changing when an instruction is loading it, the direct 
load takes priority over the modification made by the repeat mode logic. 
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10.2.1.2 Delayed Branches 


Delayed branches are implemented to guarantee the fetching of the next. 
three instructions. The delayed branches include BRD, BconaD, and 

DBcondD. Example 10-2 shows the code and pipeline operation for a 

delayed branch. 


Example 10-2. Delayed Branch 


BRD THREE Unconditional delayed branch 


MP YF ; Executed 
ADD ; Executed 
SUBF ; Executed 
AND ; Not executed 
THREE MPYF. ; Fetched after SUBF is fetched 
PIPELINE OPERATION 
PC | F | D | R | E | 
n BRD 7 _ = 
n+1 MPYF BRD ~ — No execute delay 
n+-2 ADDF MP YF BRD = 
n+3 SUBF ADDF MP YF BRD THREE — PC 
THREE MPYF SUBF ADDF  MPYF 
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10.2.1.3 Delayed Branches With Annul Option 


In addition to standard and delayed branches, the ’C40 supports delayed 
branches with an annulling option. These instructions include BcondAT 
(branch conditional, annul if true) and BcondAF(branch conditional, annul 
if false). The status of the condition (whether the cond specifed is found true 
or false) controls whether or not a branch is performed (as in a delayed 
branch). The annulling operation cancels the effect of any operation per- 
formed in the execute phase of the three instructions following the BcondAT 
or BcondAF. 
Ifthe condition is true, BcondAT annuls the effect of any operation per- 
formed in the execute phase of the three instructions that follow. 
(4 Ifthe condition is false, BCondAF annuls the effect of any operation per- 
formed in the execute phase of the three instructions that follow. 


Example 10-3 uses both BcondAT and BcondAF. 


Example 10-3. Using BcondAF and BcondAT Instructions 


LDI *AR1, RO 

BNEGAT bottom ; If negative, branch and 

ADDI *++AR2,R3 ; annul the execute phase 

MP YF ; of ADDI, MPYF, and NOT. 

NOT ; Otherwise, don’t annul and 
top: SUBF ; continue with SUBF. 

SUBL 1,R0 

BNNAF top ; If not negative, branch and 

ADDI *++AR2,R3 ; donot annul the execute 

MP YF ; phase of ADDI, MPYF, and 

NOT ; NOT. Otherwise, annul ADDI, 
bottom: XOR ; MPYF, and NOT, and continue 

; with XOR. 


At the start of Example 10-3, if the result of the load is negative (a true 
condition), the BNEGAT instruction causes a branch and also an annulment 
of the execute phase of the three instructions that follow it. As a result, the 
execute phase of the ADDI instruction does not occur, and register R3 is not 
updated by addition. However, the incrementing of AR2 and the reading of 
the data at the corresponding address do occur because these operatons 
are inthe decode and read phases of the pipeline, respectively, and thus are 
not annullable. 
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In short, operations that are annullable are: 

[1 allwrites to the register file that occur in the execute phase (ADDs, LDs, 
etc., but do not include LDA, LDPK, etc.). 

CL} all stores to memory. 


10.2.2 Register Conflicts 
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Register conflicts involve the reading or writing of registers used for ad- 
dressing purposes. These registers are ARO-AR7, IRO, IR1, BK, DP, and 
SP. These conflicts occur when the pertinent register is not ready to be used. 
lf an instruction writes to one of these registers, the decode unit cannot use 
that same register until the write is complete, i.e., until instruction execution 
is completed. | 


In Example 10-4, an auxiliary register is loaded, and the same auxiliary reg- 
ister is used on the next instruction. Since the decode stage needs the result 
of the write to the auxiliary register, the decode of this second instruction is 
delayed two cycles. Every time the decode is delayed, a refetch of the pro- 
gram word is performed; i.e., the first fetch of ADDF is at F1, followed by F2 
and F3 (the final fetch). Since these are actual refetches, they can cause 
not only conflicts with the DMA controller but also cache hits and misses. 
(If a different AR register was used in the MPYF instruction (than was used 
in the LDI instruction), no delay would occur.) 
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Example 10-4. Write to an AR Followed by an AR for Address Generation 


LDI 7,AR2 ; 7 — AR2 
NEXT MP YF *AR2,RO ; Decode delayed 2 cycles 
ADDF 
FLOAT 
PIPELINE OPERATION 
PC | F | D | R | E | 
n LDI ~ - _ Decode/address 


generation held 


n+2 ADDF MPYF LDI ~ 

n+2 ADDF MPYF | (nop) LDI 7,AR2 AR2 loaded 
n+2 ADDF MP YF (nop) (nop) 

n+3 FLOAT ADDF MPYF (nop) 


The case for reads of these registers is similar to the case for writes. If an 
instruction must read registers ARO—AR7 or SP, the use of those particular 
registers by the decode for the following instruction is delayed until the read 
is complete. The registers are read at the start of the execute cycle and 
therefore require only a one-cycle delay of the following decode. For four 
registers (IRO, IR1, BK, or DP), no delay is incurred upon a read. 
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In Example 10—5, two auxiliary registers are added together with the result 
going to an extended-precision register. The next instruction uses one of the 
same auxiliary registers as an address register. If the MPYF instruction used 
an AR register other than ARO or AR2, no delay would occur. 


Example 10-5. A Read of ARs Followed by ARs for Address Generation 


ADDI ARO,AR2,R1 ; ARO + AR2 > RI 
NEXT MP YF *++AR2,RO ; Decode delayed 1 cycle 
ADDF 
FLOAT 
PIPELINE OPERATION 
PC | F | D | R | E | 
n ADDI _ - - Decode/address 


generation held 


n+1 MPYF § ADDI en until AR is read 


~«.————— ARs read 
n+2 ADDF MP YF ADDI - 
n+2 ADDF MPYF (nop) ADDI ARO,AR2,R1 
n+3 FLOAT ADDF MP YF (nop) 


The DBR (decrement and branch) instruction’s use of auxiliary registers for 
loop counters is treated the same as if the use were for addressing. There- 
fore, the operation shown in the two previous examples can also occur for 
this instruction. 
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10.2.3 Memory Conflicts 


Memory conflicts can occur when the memory bandwidth of a physical 
memory space is exceeded. For example, RAM blocks 0 and 1 and the 
ROM block can support only two accesses every cycle. The external inter- 
face can support only one access per cycle. Some conditions under which 
memory conflicts can be avoided are discussed in Section 10.3. 


Memory pipeline conflicts consist of the following four types: 


Program Wait A program fetch is prevented from begin- 
ning. 


Program Fetch Incomplete A program fetch has begun but is not yet 
complete. 


Execute Only An instruction sequence requires three 
CPU data accesses in a single cycle. 


Hold Everything A primary or expansion bus operation 
must complete before another one can 
proceed. 7 


These four types of memory conflicts are illustrated in examples and dis- 
cussed in the paragraphs that follow. — 


Program Wait 
Two conditions can prevent the program fetch from beginning: 
(1 The start of a CPU data access when 


m Two CPU data accesses are made to an internal RAM or ROM 
block, and a program fetch from the same block is necessary. 

™@ One of the external ports is starting a CPU data access, and a pro- 
gram fetch from the same port is necessary. 


{3 A multicycle CPU data access or DMA data access over the external 
bus is needed. 
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Example 10-6 illustrates a program wait until a CPU data access 
completes. In this case, “ARO and “AR1 are both pointing to data in RAM 
block 0, and the MPYF instruction will be fetched from RAM block 0. This 
results in the conflict shown. Since no more than two accesses can be made 
to RAM block 0 in a single cycle, the program fetch cannot begin and must 
wait until the CPU data accesses are complete. | 


Example 10-6. Program Wait Until CPU Data Access Completes 
ADDF3 *ARO, *AR1,RO0 


FIX 
MPYF 
ADDF3 
NEGB 
PIPELINE OPERATION 
PC | F | D | R | E | 
n | ADDF3 _ _ _ Fetch held until 
| ARs are read 
net FIX ADDF3 = - 
eae ARs read 
| ne2 (wait)| FIX  ADDF3 — 
n+2 _ MPYF (nop) FIX  ADDF3 *ARO,AR1,RO 
n+3 ADDF3 MPYF (nop) FIX 
n+4 NEGB ADDF3 MPYF (nop) 


Example 10—7 shows a program wait due to a multicycle data-data access — 
ora multicycle DMA access. The ADDF, MPYF, and SUBF are fetched from 
some portion in memory other than the external port the DMA requires. The 
DMA begins a multicycle access. The program fetch corresponding to the 
CALL is made to the same external port the DMA is using. _ 


Even if the DMA was configured as the lowest priority, a multicycle access 
cannot be aborted. The program fetch must therefore wait until the DMA 
access completes. | 
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Example 10-7. Program Wait Due to Multicycle Access 


n+4 


Program Fetch Incomplete 


PIPELINE OPERATION 
F | p | R | E 
ADDF _ - _ 
MPYF ADDF = = 
SUBF MP YF ADDF = 
(wait) SUBF MP YF ADDF 
CALL (nop) SUBF MP YF 
= CALL (nop) SUBF 


T 


2~—cycle DMA access 


A program fetch incomplete occurs when a program fetch takes more than 
one cycle to complete due to wait states. In Example 10-8, the MPYF and 
ADDF are fetched from memory that supports single-cycle accesses. The 
SUBF is fetched from memory requiring one wait state. One example that 
demonstrates this conflict is a fetch across a bank boundary on the primary 


port. 


Example 10-8. Multicycle Program Memory Fetches 


PC 


n+1 


n+2 RDY 


n+2 RDY 


n+3 


PIPELINE OPERATION 
F | Dp | R |] E 
MP YF = = = 
ADDF MP YF = —_ 
SUBF ADDF MPYF - 
SUBF (nop) ADDF MP YF 
ADDI SUBF (nop) ADDF 


wait State required 
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Execute Only 


The Execute Only type of memory pipeline conflict occurs when a sequence 
of instructions requires three CPU data accesses in a single cycle. There 
are two cases in which this occurs: 


) An instruction performs a store and is followed by an instruction that 
does two memory reads. 


LJ Aninstruction performs two stores and is followed by an instruction that 
performs at least one memory read. | 


The first case is shown in Example 10-9. Since this sequence requires 
three data memory accesses and only two are available, only the execute 
phase of the pipeline is allowed to proceed. The dual reads required by the 
LDF || LDF is delayed one cycle. Note that a refetch of the next instruction 
can occur. 


Example 10-9. Single Store Followed by Two Reads 


STF RO, *AR1 ;RO — *ARI1 
LDF *AR2,R1 ; *AR2 —R1 in parallel with 
[| LDF  *AR3, R2 ; *AR3 > R2 
PIPELINE OPERATION 
PC | F | D | R | E | 
n STF és o = 
n+1 LDF | |LDF STF a = 
n+2 W LDF ||LpF STF es Write must 
complete 
n+3 Xx W toFIILDF STF RO,*AR1 Defore the 
2 reads can 
n+4 X W LDF | |LDF (nop) complete. 
n+4 —Y Xx W LDF||LDF *AR2,R1 and *AR3,R2 
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Example 10-10 shows a parallel store followed by a single load or read. 
Since the two parallel stores are required, the next CPU data memory read 
must wait a cycle before beginning. One program memory refetch may 


occur. 


Example 10-10. Parallel Store Followed by Single Read 


STF RO, *ARO 7; RO — *ARO in parallel with 
|| STF  _R2,*AR1 ;R2 —- *AR1 

ADDF @SUM, RL ; Rl + @SUM —RI1 

IACK 

ASH 

PIPELINE OPERATION 

PC | F | D | R | E | 
n stF|| str = - a 


must wait 
until the writes 


n+1 ADDF stF || str = 
are complete 
n+2 TACK ADDF str || str 
n+3 ASH TACK ADDF stF|| STF RO,*ARO and R2,*AR1 
n+4 ASH IACK ADDF (nop) 
n+4 = ASH TACK ADDF 
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Example 10-11. 
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Hold Everything 


There are three types of Hold Everything memory pipeline conflicts: 


L} ACPU data load or store cannot be performed because an external port 
is busy. 


[) An external load takes more than one cycle. 
1) Conditional calls and traps. 


The first type of Hold Everything conflict occurs when one of the external 
ports is busy because of access that has started but is not complete. In 
Example 10-11, the first store is a two-cycle store. The CPU writes the data 
to an external port. The port control then takes two cycles to complete the 
data-data write. The LDF is a read over the same external port. Since the 
store is not complete, the CPU continues to attempt LDF until the port is 
available. 


Busy External Port 


STF RO, @DMA1 
LDF @DMA2, RO 
PIPELINE OPERATION 
PC | F | Dp | R |] E | 
n STF — - ~ 
n+1 LDF STF —_ — 
n+2 W LDF STF — 
n+2 W LDF (nop) STF 1 
2-cycle external bus 

n+2 W LDF (nop) (nop) | write access 
n+3 | X W LDF (nop) 
n+4 Y X W LDF 


The second type of Hold Everything conflict involves multicycle data reads. 
The read has begun and continues until completed. In Example 10-12, the 
LDF is performed from an external memory that requires several cycles to 
complete. 
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Example 10-12. Multicycle Data Reads 


LDF @DMA, RO 


PIPELINE OPERATION 
PC F | D | R | E | 
n LDF ~ ~ - 
n+1 I LDF _ - 
n+2 J I LDF a + 

2-cycle external bus 

n+3 K (dummy) I LDF = | read access 
n+3 K9 J I LDF 


The final type of Hold Everything conflict deals with conditional calls and 
traps, which are different from the other branch instructions. Whereas the 
other branch instructions are conditional loads, the conditional calls and 
traps are conditional stores, which take one more cycle than a conditional 
branch (see Example 10—13). The added cycle is used to push the return 
address after the call condition is evaluated. 


Example 10-13. Conditional Calls and Traps 


PIPELINE OPERATION 
PC | F | D | - R | E | 
n CALLcond _ — — 
n+1 I CALLcond — ~ 
n+1 (nop) (nop) CALLcond — 
n+1 (nop) (nop) (nop) CALLcond 
PC store 
n+1 (nop) (nop) (nop) | CALLcond cycle 
n+2/CALLaddr I (nop) (nop) (nop) 
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10.3 Resolving Memory Conflicts 


If program fetches and data accesses are performed in such a manner that 
the resources being used cannot provide the necessary bandwidth, the 
program fetch is delayed until the data access is complete. Certain 
configurations of program fetch and data accesses yield conditions under 
which the TMS320C40 can achieve maximum throughput. 


Table 10—1 shows how many accesses can be performed from the different 
memory spaces when itis necessary to do aprogram fetch and a single data 
access, and still achieve maximum performance (one cycle). Four cases 
achieve one-cycle maximization. 


Table 10-1. One Program Fetch and One Data Access for Maximum Performance 


Accesses From Local Bus 
a Bel Haccaoal Dual-Access Or Peripheral 
internal ae wea a 
a 
2 ag any 
combination 
of internal os 
—— aa 
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Table 10-2. 


| 3 from different internal memory 
blocks 


Table 10—2 shows how many accesses can be performed from the different 
memory spaces when it is necessary to do a program fetch and two data 
accesses, still achieving maximum performance (one cycle). Six cases 
achieve this maximization. 


One Program Fetch and Two Data Accesses for Maximum Performance 


Local Or 
Global Bus Accesses From Dual-Access 
Peripheral Bus 
one Memory Adtaasoc 
2 from any combination of internal 
memory 


2 from same internal memory 
block and1 from a different inter- 
nal memory block 


a ad 


2 from any combination of internal 

memory 
7 | Aprogram | 2data | SMA 
| 8 | 1DMA [2data St program 
T For Cases 2 and 3, see Three-Operand Instruction Memory Reads on 
page 10-21. 
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10.4 Clocking of Memory Accesses 


Internal clock phases (H1 and H3) and their relationship to memory ac- 
cesses are discussed in this section to show how the TMS320C40 handles 
multiple memory accesses. Whereas the previous section discussed the in- 
teraction between sequences of instructions, this section discusses the flow 
of data on an individual instruction basis. 


Each major clock period of 40 ns is composed of two minor clock periods 
of 20 ns, labeled H3 and H1 (these times assume a 50-MHz ’C40). The ac- 
tive clock period for H3 and H1 is the time when that signal is high. 


<- Major Clock Period — 


Ht 


H3 


The precise operation of memory reads and writes canbe defined according 
to these minor clock periods. The types of memory operations that can occur 
are program fetches, data loads and stores, and DMA accesses. 


10.4.1 Program Fetches 
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Internal program fetches are always performed during H3 unless a single 
data store must occur at the same time because of another instruction in the 
pipeline. In this case, the program fetch occurs during H1 and the data store 
during H3. 


External program fetches always start at the beginning of H3 with the ad- 
dress being presented on the external bus. At the end of H1, they are com- 
pleted with the latching of the instruction word. 
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10.4.2 Data Loads and Stores 


Four types of instructions perform loads, memory reads, and stores: two-op- 
erand instructions, three-operand instructions, multiplier/ALU operation 
with store instructions, and parallel multiply and add instructions. See Chap- 
ter 5 for detailed information on addressing modes. 


As discussed in Chapter 7, the number of bus cycles for external memory 
accesses differs in some cases from the number of CPU execution cycles. 
For external reads, the number of bus cycles and CPU execution cycles is 
identical. For external writes, there are always at least two bus cycles, but 
unless there is a port access conflict, there is only one CPU execution cycle. 
In the following examples, any difference in the number of bus cycles and 
CPU cycles is noted. 


Two-Operand Instruction Memory Accesses 
Figure 10-2. Two-Operand Instruction Word | 
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Two-operand instructions inclidie all ties instructions with bits 31-29 be- 
ing 0005 or 0109 (see Figure 10-2). Inthe case of adata read, bits 15—0 rep- 
resentthe srcoperand. Internal data reads are always performed during H1. 
External data reads always start at the beginning of H3 with the address be- 
ing presented on the external bus, and they complete with the latching of 
the data word at the end of H1. 


In the case of adata store, bits 15—0 represent the detonated Internaldata 
stores are performed during H3. External data stores always start at the 
beginning of H3 with the address and data being presented on the external 
bus. 


Three-Operand Instruction Memory Reads 
Figure 10-3. Three-Operand Instruction Word 
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Three-operand instructions include all instructions with bits 31-29 being 
0015 (see Figure 10-3). The source operands, src? and src2, come from 
either registers or memory. When one or more of the source operands are 
from memory, these instructions are always memory reads. 
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Figure 10-4. 
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If only one of the source operands is from memory (either src7 or src2) and 
is located in internal memory, the data is read during H1. If the single 
memory source operand is in external memory, the read starts at the begin- 
ning of H3, with the address being presented on the external bus, and com- 
pletes with the latching of the data word at the end of H1. 


If both source operands are to be fetched from memory, then several cases 
occur. If both operands are located in internal memory, the src7 read is per- 
formed during H3 and src2 during H1, thus completing two memory reads 
in a single cycle. 


If src7 is in internal memory and src2is in external memory, the src2 access 
begins at the start of H3 and latches at the end of H1. At the same time, the 
src1 access to internal memory is performed during H3. Again, two memory 
reads are completed in a single cycle. 


If src7 is in external memory and src2 is in internal memory, two cycles are 
necessary to complete the two reads. In the first cycle, the internal src2 ac- 
cess is performed. The src7 is also performed, but not latched until the next 
H3. | 


If src? and src2 are both from external memory, two cycles are required to 
complete the two reads. In the first cycle, the src7 access is performed and 
loaded on the next H3; in the second cycle, the src2 access is performed 
and loaded on that cycle’s H1. 


Operations with Parallel Stores 
Multiply or CPU Operation With a Parallel Store 


15 87 0 
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The next class of instructions includes all instructions that have a store in 
parallel with another instruction. Bits 31 and 30 for these instructions are 
equal to 1 1p. 


For operations that perform a multiply or ALU operation in parallel with a 
store, the instruction word format is shown in Figure 10—4. If the store opera- 
tion to dsi2 is external or internal, it is performed during H3. Two bus cycles 
are required for external stores, but only one CPU cycle is necessary to 
complete the write. 


If the memory read operation is external, it starts at the beginning of H3 and 
latches at the end of H1. If the memory read operation is internal, it is 
performed during Hi. Note that memory reads are performed by the CPU 
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during the read (R) phase of the pipeline, and stores are performed during 
the execute (E) phase. 


The instruction word format for instructions that have parallel stores to 
memory is shown in Figure 10—5. If both destination operands, dst7 and 
dst2, are located in internal memory, dst7 is stored during H3 and dst2 dur- 
ing H1, thus completing two memory stores in a single cycle. 


Figure 10-5. Two Parallel Stores 
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If dst7 is in external memory and dsi2 is in internal memory, the dst7 store 
begins at the start of H3. The dst2store to internal memory is performed dur- 
ing H1. Two bus cycles are required for the external store, but only one CPU 
cycle is necessary to complete the write. Again, two memory stores are 
completed in a single cycle. | 


If dst7 is in internal memory and dsi2 is in external memory, an additional 
bus cycle is necessary to complete the dsi2 store. Only one CPU cycle is 
necessary to complete the write, but the port access requires three bus 
cycles. In the first cycle, the internal dst7 store is performed during H3, and 
dst2 is written to the port during H1. During the next cycle, the dst2 store is 
performed on the external bus, beginning in H3, and executes as normal 
through the following cycle. 


lf dst? and dst2 are both written to external memory, a single CPU cycle is 
still all that is necessary to complete the stores. In this case, four bus cycles 
are required. 


1) Inthe first cycle, both dst? and dsi2 are written to the port, and the exter- 
nal bus access for dst7 begins. 


2) The store for dst7 is completed on the second cycle, and the store for 
dst2 begins on the third external bus cycle. 


3) Finally, the store for dst2 is completed on the fourth external bus cycle. 
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Parallel Multiplies and Adds 


Memory addressing for parallel multiplies and adds is similar to that for 
three-operand instructions. The parallel multiplies and adds include all in- 
structions with bits 31-30 equal to 10. (see Figure 10-6). 


Figure 10-6. Parallel Multiplies and Adds 
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For these operations, src3 and src4 are both located in memory. If both oper- 
ands are located in internal memory, src3 is performed during H3, and src4 
is performed during H1, thus completing two memory reads in asingle cycle. 


If src3 is in internal memory and src4 is in external memory, the src4 access 
begins at the start of H3 and latches at the end of H1. At the same time, the 
src3 access to internal memory is performed during H3. Again, two memory 
reads are completed in a single cycle. 


If src3 is in external memory and src4 is in internal memory, two cycles are 
necessary to complete the two reads. In the first cycle, the internal src4 ac- 
cess is performed. During the H3 of the next cycle, the src3 access is per- 
formed. 


lf src3 and src4 are both from external memory, two cycles are necessary 
to complete the two reads. In the first cycle, the src3 access is performed; 
in the second cycle, the src4 access is performed. 


10-24 | | Pipeline Operation 


Chapter 11 


ieheeeenceeaes epee aaeiiait Sos es PENSE RRR aU RRR OE RRR RRA eb eR MRR ARR OR RATE IC LI 
Be SAIN PS OO BR ARIAL SAR BRAIN ALIN ACR REAR ANSARI A RAR SNS SNS SET RR AA ORAS SAL SSDS 
DEH eA ner enealoteeSotatebetntotatoSitncalesoeaotecatetatatosntncetaratatetatasete otal otatatetaeregorn les omagaatn arene gacutaremngnantorasonetecerasennsnancatecasornnre ute sataratniioatain ali cacstac aca rancacetaronearereCAteratasecaretacaterecatesacate suesecelncace ntnce tate nse sare see ne SaGntntntnduonte satecntanentnacnsntetnencnevantece ice enenecess SecosatozatesetacseACeAreSeSarecee/styeneceLeCMN eta tetas ect SATA tecAAATe SA aan ane ete 


POPP Pe ae eae ae e'a'e'a'a ahaa n'a’e"a"e'a'a’a'a'a aa e's e"e'a'e' ow ‘ever 'a'o'e'e'a'a‘e'e'e'e'e'o'o'e'a wooo" ‘a’ e'o'e'e'e'e'ee'e'e'e'v e'e'a'e'e'e' vee 'e'a'e'e'e's’e's'e'e'e'e'e's" 


eo wanguage ee _. 


The TMS320C40 assembly language _ instruction set supports 
numeric-intensive, signal processing, and general-purpose applications. 
The instructions are organized into these major groups: load-and-store, 
two- or three-operand arithmetic/logical, parallel, program control, and 
interlocked operations instructions. The addressing modes used with the 
instructions are described in Chapter 5. 


The TMS320C40 instruction set can also use one of 20 condition codes with 
any of the 10 conditional instructions, such as a as This chapter 
defines the condition codes and flags. 


The assembler allows optional syntax forms to simplify the assembly 
language for special-case instructions. These optional forms are listed and 
explained. 


Each of the individual instructions is described and listed in alphabetical 
order. An example instruction (on pages 11-15 through 11-17) 
demonstrates the special format used and explains its content. 


This chapter discusses the following major topics: 


Section Page 
14.1 INStEUCTION SEU: esd. bc 88Gb Sakae Ee ete eee eee 11-3 
M@ Load-and-Store Instructions ............. 0.0 eee eee 11-3 
m@ Two-Operand Arithmetic/Logical Instructions ........ 11-4 
m@ Three-Operand Arithmetic/Logical Instructions ....... 11-6 
™@ Program Control Instructions ...................00. 
@ interlocked Operations Instructions 
™@ Parallel Operations Instructions ................008: 
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Page 

Condition Codes and Flags ............... Sore eteeaun Gees 11-10 
Individual Instructions .............cc cece eee eeeeeeeees 11-13 
Symbols and Abbreviations Used in Instructions ..... 11-13 
Optional Assembler Syntaxes .......... Seeeeeees 11-15 


Individual instruction descriptions, alphabetized 
(includes syntax, operation, operands, encoding, 
description, cycles, status bits, mode bit, examples) . 11-17 
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11.1 Assembly Language Instructions — Instruction Set 


The TMS320C40 instruction set is exceptionally well-suited to digital signal 
processing and other numeric-intensive applications. All instructions are a 
single machine word long, and most instructions take a single cycle to ex- 
ecute. In addition to multiply and accumulate instructions, the TMS320C40 
possesses a full complement of general-purpose instructions. 


The instruction set contains 135 instructions organized into the following 
functional groups: 


[1 Load-and-store 

Two-operand arithmetic/logical 
Three-operand arithmetic/logical 
Program control 

Interlocked operations 


Ooooeod a 


Parallel operations 


Each of these groups is discussed in the succeeding subsections. 


11.1.1 Load-and-Store Instructions 


The TMS320C40 supports 23 load-and-store instructions (see Table 11-1). 
These instructions can 


[1 Load a word from memory into a register, 
LL) Store a word from a register into memory, or 
© Manipulate data on the system stack. 


Two of these instructions can load data conditionally. This is useful for locat- 
ing the maximum or minimum value in a data set. See Section 12.2 for de- 
tailed information on condition codes. 
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Table 11-1. Load-and-Store Instructions 


[instruction | _Deseription 


LDEP Load integer, expansion file register 
to primary register 
| LDF | Load floating-point value 
LDFcond | Load floating-point value 


L 

LDHI Load 16-bit unsigned immediate 
into 16 MSBs 

L 


[instruction | Description 


Load word right-shifted 


| POPF | Pop floating-point value from stack 


| PUSH Push integer on stack 


PUSHF Push floating-point value on stack 


Store floating-point value 
Store integer 
STIK Store integer immediate 


LDicond | Load integer conditionally 
Load floating-point mantissa 


LDPE Load integer, primary register to 
expansion file register 


11.1.2 Two-Operand Instructions 


DF 
Load integer 
DM 


The TMS320C40 supports a complete set of 43 two-operand arithmetic and 
logical instructions. The two operands are the source and destination. The 
source operand may be a memory word, a register, or aconstant. The desti- 
nation operand is always a register. 


These instructions provide integer, floating-point, or logical operations, 
and multiprecision arithmetic. Table 11-2 lists these instructions. 
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Table 11-2. Two-Operand Instructions 


[instruction Description 


Bitwise logical-AND with 
ASHT Arithmetic shift 


CMPFT | Compare floating-point values 
cupit 


Convert floating-point value to integer 


| Instruction Description 


/NEGI Negate integer 


| NORM | Normalize floating-point value 
| NOT | Bitwise logical-complement 
Bitwise logical-OR 


RCPF Reciprocal floating point 
Round floating-point value 


NOT 
Rotate left 
ROLC Rotate left through carry 


[ROR |Rotateright 
RORC Rotate right through carry 


Reciprocal of square root, floating 


SUBBT | Subtract integers with borrow 


SUBC Subtract integers conditionally 


[suBRE | Subtract reverse integer wih borrow 


Convert twos complement to IEEE 


TSTBt Test bit fields 


XORT Bitwise exclusive-OR 


| FLOAT | Convert integer to floating-point value_ 
twos-complement floating-point for- 


MPYFT | Multiply floating-point values 
MPYIT Multiply integers 


Multiply signed integer, 32-MSB 
Myon 
Multiply unsigned integer, 32-MSB 


NEGB Negate integer with borrow 
NEGF Negate floating-point value 


T Two- and three-operand versions 
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[instruction 
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11.1.3 Three-Operand Instructions 


Most instructions contain two or three operands. The 19 three-operand in- 
structions allow the TMS320C40 to read two operands from memory or the 
CPU register file in a single cycle and store the results in a register. The fol- 
lowing differentiates the two- and three-operand instructions: 


L} Two-operand instructions have a single-source operand (or shift count) 
and a destination operand. 


(1 Three-operand instructions may have two source operands (or one 
source operand and a count operand) and a destination operand. A 
source operand may be a memory word, a register or a constant. The 
destination of a three-operand instruction is always a register. 


Table 11-3 lists the instructions that have three-operand versions. Note 
that the 3 in the mnemonic can be omitted from three-operand instructions 
(see subsection 11.3.2). | 


Table 11-3.  Three-Operand Instructions 


Description 


| Instruction | Description | 


MPYF3 Multiply floating-point values | 
MPYI3 Multiply integers 


Multiply signed integer, 32-MSB 
Multiply unsigned integer, 32-MSB 


[ORS | BitwiseiogiaOR id 
SUBFs | Subtract floating-point values | 


_ADDF3 | Add floating-point values 


ADDIS Add integers 
AND3 Bitwise logical-AND 


11.1.4 Program Control Instructions 


The program-control instruction group consists of all of those instructions 
(23) that affect program flow. The repeat mode allows repetition of a block 
of code (RPTB and RPTBD) or of a single line of code (RPTS). Both stan- 
dard and delayed (single-cycle) branching are supported. Several of the 
program control instructions are capable of conditional operations (see Sec- 
tion 12.2 for detailed information on condition codes). Table 11—4 lists the 
program control instructions. 


11-6 Assembly Language Instructions 
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Table 11-4. Program Control Instructions 


| Instruction — Description 
LAJcond Link and jump conditional 


Link and trap contitional 
a 


RPTBD 
Swi 


TRAPcond | Trap conditionally 


| Instruction | Description 
Branch conditionally (standard) 


| Branch conditionally delayed and 
BeondAF annul if false | 
Branch conditionally delayed and 


[BR____| Branch unconditionally (standard) _| 
CALLcond 


11.1.5 Interlocked Operations Instructions 


The interlocked operations instructions support multiprocessor communi- 
cation and the use of external signals to allow for powerful synchronization 
mechanisms. They also guarantee the integrity of the communication and 
result in a high-speed operation. Refer to Chapter 7 for examples of the use 
of interlocked instructions. 


Table 11-5. — Interlocked Operations Instructions 


[instruction [Description 


LDFI Load floating-point value, interlocked 


LDII Load integer, interlocked 
SIGI Signal, interlocked 


[instruction] Description 


STFI Store floating-point value, interlocked 


STIl Store integer, interlocked 
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11.1.6 Parallel Operations Instructions 


The parallel-operations instructions group makes a high degree of parallel- 
ism possible. Some of the TMS320C40 instructions can occur in pairs that 
will be executed in parallel. These instructions offer the following features: 


[) Parallel loading of registers, 
© Parallel arithmetic operations, or | 
© Arithmetic/logical instructions used in parallel with a store instruction. 


Each instruction in a pair is entered as a separate source statement. The 
second instruction in the pair must be preceded by two vertical bars (||). 
Table 11-6 lists the valid instruction pairs. 


Table 11-6. Parallel Instructions 


en Dern 
| Parallel Arithmetic with Store Instructions _ | 


| STE Absolute value of a floating-point number and store floating-point value 

STI Absolute value of an integer and store integer 

STE Add floating-point values and store floating-point value 

ADDI3 Add integers and store integer 

| STI | 

as Bitwise logical-AND and store integer 

ASH3 Arithmetic shift and store integer 

[| STI | | 

1 a Convert floating-point to integer and store integer 

STE Convert integer to floating-point value and store floating-point value 

fies Convert IEEE floating-point format and store | 

Ta : Load floating-point value and store floating-point value 
LDI Load integer and store integer | 
| STI 


Table concluded on next page. 
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Table 11-6. 


Parallel Instructions (Concluded) 


Twinemonie | Description 


Parallel Arithmetic with Store Instructions 


LSH3 Logical shift and store integer 

|| STI 

STFS Multiply floating-point values and store floating-point value 
st Multiply integer and store integer 


STF Negate floating-point value and store floating-point value 


NEGI Negate integer and store integer 

|| STI 

NST Complement value and store integer 

rst Bitwise logical-OR value and store integer | 
STF Store floating-point values 

\| STF 

ST Store integers 

\| STI 


STF Subtract floating-point value and store floating-point value 
TOIEEE Convert to IEEE format and store 

|| STF 

ist Subtract integer and store integer 


| ion Bitwise exclusive-OR values and store integer 


Parallel Load Instructions 


LDF Load floating-point 
|| LDF 

LDI Load integer 

|| LDI 


Parallel Multiply and Add/Subtract Instructions 


MPYF3 Multiply and add floating-point 

ADOFS 
MPYF3 Multiply and subtract floating-point 

|| SUBF3 

MPYI3 Multiply and add integer | 

|| ADDIS 

MPYI3 Multiply and subtract integer 

|| SUBI3 | 


6) ” 
TI 
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11.2 Condition Codes and Flags 


Table 11-7. 
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The TMS320C40 provides 20 condition codes (00000 — 10100, excluding 
01011) that can be used with any of the conditional instructions, such as 
RETScond or LDF cond. The conditions include signed and unsigned com- 
parisons, comparisons to zero, and comparisons based on the status of in- 
dividual condition flags. Note that all conditional instructions can also accept 
the suffix U to indicate unconditional operation. 


Seven condition flags provide information about properties of the result of 
arithmetic and logical instructions. The condition flags are stored in the sta- 
tus register (ST) and are affected by an instruction based upon the SET 
COND field (bit 15 of the status register). 


[1 If SET COND =0, the ST condition flags are set if the operation’s target 
is any extended-precision register (RO—-R11) . 


(] If SET COND = 1, the ST condition flags are also set if the operaton’s 
target is any register in the primary register file except the statu 
register. | 


LL} The value of SET COND (0 or 1) does not affect the nature of the com- 
pare instructions (CMPF, CMPF3, CMPI, CMPI3, TSTB, or TSTB3). 


The condition flags may be modified by most instructions when either of the 
preceding conditions is established and either of the following two cases oc- 
curs: | 


Q Aresult is generated when the specified operation is performed to infi- 
nite precision. This is appropriate for compare-and-test instructions that 
do not store results in a register. It is also appropriate for arithmetic in- 
structions that produce underflow or overflow. 


[3 The output is written to the destination register as shown in Table 11-7. 
This is appropriate for other instructions that modify the condition flags. 


Output Value Formats 


igor Setting 


Figure 11-1 shows the condition flags in the low-order bits of the status reg- 
ister. Following the figure is a list of status register condition flags and de- 
scriptions on how the flags are set by most instructions. For specific details 
of the effect of a particular instruction on the condition flags, see the de- 
scription of that instruction in subsection 11.3.3. | 


Assembly Language Instructions 


7 Condition Codes and Flags 


Figure 11-1. Status rma 


31 17 16 . 


27 24 20 


11 10 9 
RW RW RW RW RW RW R/W RW RW RW RW RW RAW 


NOTE: xx = reserved bit. 
R = read, W = write. 


LUF Latched Underflow Condition Flag. LUF is set whenever UF (floa- 
ting-point underflow flag) is set. LUF may be cleared only by a proces- 
sor reset or by modifying it in the status register (ST). 


LV __Latched Overflow Condition Flag. LV is set whenever V (overflow 
condition flag) is set. Otherwise, it is unchanged. LV may be cleared 
only by a processor reset or by modifying it in the status register (ST). 


UF Floating-Point Underflow Condition Flag. A floating-point under- 
flow occurs whenever the exponent of the result is less than or equal 
to —128. If a floating-point underflow occurs, UF is set, and the output 
value is set to 0. UF is cleared if a floating-point underflow does not 
occur. 


N Negative Condition Flag. Logical operations assign N the state of 
the MSB of the output value. For integer and floating-point opera- 
tions, N is set if the result is negative, and cleared otherwise. Zero is 
positive. 


Z Zero Condition Flag. For logical, integer, and floating-point opera- 
tions, Z is set if the output is 0, and cleared otherwise. _ 


V Overflow Condition Flag. For integer operations, V is set if the re- 
sult does not fit into the format specified for the destination (i.e., -232 
< result < 2 32 — 1). Otherwise, V is cleared. For floating-point opera- 
tions, V is set if the exponent of the result is greater than 127; other- 
wise,V is cleared. Logical operations always clear V. 


C Carry Flag. When an integer addition is performed, C is set if a carry 
occurs out of the bit correspondingtothe MSBoftheoutput.Whenan | 
integer subtraction is performed, C is set if a borrow occurs into the bit 
corresponding to the MSB of the output. Otherwise, for integer opera- 
tions, C is cleared. The carry flag is unaffected by floating-point and 
logical operations. For shift instructions, this flag is set to the final val- 
ue shifted out; for a zero shift count, this is set to zero. 
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Table 11-8 lists the condition mnemonic, code, description, and flag for 
each of the 19 condition codes. 


Condition Codes and Flags 


Unconditional ee 


Unsigned Compares 


LO Lower than 
LS Lower than or same as 
HI Higher than | 
HS Higher than or same as 
EQ Equal to 
NE Not Equal to 
Signed Compares 
L Less than 
LE Less than or equal to 
GT Greater than 
GE Greater than or equal to 
EQ Equal to 
NE Not equal to 
Compare to Zero 
Z Zero 
NZ Not zero 
P Positive 
N Negative 
NN Nonnegative 


Compare to Condition Flags 


Nonnegative 
Negative 
Nonzero 
Zero 

No overflow 
Overflow 

No underflow 

Underflow 

No carry 

Carry 

No latched overflow 

Latched overflow 

No latched floating-point underflow 
Latched floating-point underflow 
Zero or floating-point underflow 


T The ~ means logical complement (“not true” condition). 
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11.3 Individual Instructions 


This section contains the individual assembly language instructions for the 
TMS320C40. The instructions are listed in alphabetical order. Information 
for each instruction includes assembler syntax, operation, operands, en- 
coding, description, cycles, status bits, mode bit, and examples. 


Definitions of the symbols and abbreviations, as well as optional syntax 
forms allowed by the assembler, precede the individual instruction descrip- 
tion section. Also, an example instruction shows the special format used | 
and explains its content. 


A functional grouping of the instructions, as well as a complete instruction 
set summary, can be found in Section 11.1. Appendix B lists the opcodes 
for all the instructions. Refer to Chapter 6 for information on memory ad- 
dressing. Code examples using many of the instructions are given in Chap- 
ter NO TAG, Software Applications. 


11.3.1 Symbols and Abbreviations 


Table 11—9 lists the symbols and abbreviations used in oS individual in- 
struction descriptions. 
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Table 11-9. Instruction Symbols 


Source operand 

Source operand 1 
Source operand 2 
Source operand 3 
Source operand 4 


Destination operand 
Destination operand 1 
Destination operand 2 
Displacement 
Condition 

Shift count 


General addressing modes 
Three-operand addressing modes 
Parallel addressing modes 
Conditional-branch addressing modes 


Auxiliary register n 

Index register n 

Register addressn 
Repeat count register 
Repeat end address register 
Repeat start address register 
Status register 


Carry bit 
Global interrupt enable bit 
Trap vector 
Program counter 

~ Repeat mode flag 
System stack pointer 


Absolute value of x 

Assign the value of x to destination y 
Mantissa field (sign + fraction) of x 
Exponent field of x 


Bitwise logical-AND of x and y 
Bitwise logical-OR of x and y 
Bitwise logical-XOR of x and y 
Bitwise logical-complement of x 


Shift x to the left y bits 

Shift x to the right y bits 

Increment SP and use incremented SP as address 
Use SP as address and decrement SP 
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11.3.2 Optional Assembler Syntaxes 


The assembler allows a relaxed syntax form for some instructions. These | 
optional forms simplify the assembly language so that special-case syntax 
can be ignored. The following is a list of these optional syntax forms. 


( The destination register can be omitted on unary arithmetic and log- 
ical operations when the same register is used as a source. For exam- 
ple, 

ABSI  R0O,RO can be written as ABSI RO 
Instructions affected: ABSI, ABSF, FIX, FLOAT, NEGB, NEGF, NEGI, ~ 
NORM, NOT, RND. 

[3 All 3-operand instructions can be written without the 3. For example, 
ADDI3 RO,R1,R2_ can be written as ADDI RO,R1,R2 
Instructions affected: ADDC3, ADDF3, ADDI3, AND3, ANDN3, ASH3, 
LSH3, MPYF3, MPYI3, OR3, SUBB3, SUBF3, SUBI3, XOR3, 
MPYSHI3, MPYUHIS. 

This also applies to all the pertinent parallel instructions. 

[1 All 3-operand comparison instructions can be written without the 3. For 
example, 

CMPI3 RO, *ARO can be written as CMPI RO, *ARO 
Instructions affected: CMPI3, CMPF3, TSTB3. | 

[4 Indirect operands with an explicit 0 displacement are allowed. In 3-oper- 
and or parallel instructions, operands with 0 displacement are automat- 
ically converted to no-displacement mode. For example: 

LDI *+ARO (0) ,R1 Is legal 
Also 
ADDI3 *+AR0O(0),R1,R2 is equivalentto ADDI3 *ARO,R1,R2 

[} Indirect operands can be written with no displacement; in which case, 
a displacement of one is assumed. For example, 

LDI *ARO++(1),RO canbe written LDI *ARO++, RO 

[i All conditional instructions accept the suffix U to indicate unconditional 
operation. Also, the U can be omitted from unconditional short branch 
instructions. For example: 

BU label can be written B label 

(J Labels can be written with or without a trailing colon. For example: 
label0O: NOP 
labell NOP 
label2: (label assembles to next source line) 
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Empty expressions are not allowed for the displacement in indirect 
mode: 

LDI *+AR0(),RO isnotlegal . 

Immediate-mode destination operands of BR and CALL can be writ- 
ten with an at sign (@): 

BR label can be written BR @label 
The LDP pseudo-op can be used to load a register (DP by default) with 
the 16 MSBs of a relocatable address as follows: 

LDP addr,REG or LDP @addr, REG 
The at sign (@) is optional. 


If the destination REG is the DP, LDP generates an LDPK instruction. 
Otherwise it generates an LDIU instruction. In both cases an immediate 
operand with a special relocation type is used. 


Parallel instructions can be written in either order. For example: 
ADDI can be written as — STI 
|| STI | || ADDI 
The parallel bars indicating part 2 of a parallel instruction can be written 
anywhere on the line from column 0 to the mnemonic. For example: 
ADDI can be written as ADDI 
bl Sie iy SPL 
If the second operand of a parallel instruction is the same as the third 
(destination register) operand, the third operand can be omitted. This 


allows the writing of 3-operand parallel instructions that look like normal 
2-operand instructions. For example, 


ADDI *ARO,R2,R2 canbewritten as ADDI *ARO,R2 
|| MPYI *AR1,R0,RO || MPYI *AR1,RO 


Instructions affected (applies to all parallel instructions that have a reg- 
ister as the second operand): ADDI, ADDF, AND, MPYI, MPYF, OR, 
SUBI, SUBF, XOR. 


All commutative operations in parallel instructions can be written in ei- 
ther order. For example, the ADDI part of a parallel instruction can be 
written in either of two ways: 7 


ADDI *ARO,R1,R2 or ADDI R1,*ARO, R2 


The instructions affected are parallel instructions containing any of the 
following: ADDI, ADDF, MPYI, MPYF, AND, OR, XOR. 3 


Use the syntax in Table 11-10 to designate CPU registers in operands. 
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11.3.3 Individual Instruction Descriptions 


Each assembly language instruction for the TMS320C40 is described in 
this section in alphabetical order. The description includes the assembler 
syntax, operation, operands, encoding, description, cycles, status bits, 
mode bit, and examples. 


Table 11-10. | CPU Register Syntax 


Register 
Machine Assigned Function Name 
Value (hex) 


Extended-precision register 0 
Extended-precision register 1 
Extended-precision register 2 
Extended-precision register 3 
Extended-precision register 4 
Extended-precision register 5 
Extended-precision register 6 
Extended-precision register 7 
Extended-precision register 8 
Extended-precision register 9 
Extended-precision register 10 
Extended-precision register 11 


Explained 
in 
Paragraph 


Auxiliary register 0 
Auxiliary register 1 
Auxiliary register 2 
Auxiliary register 3 
Auxiliary register 4 
Auxiliary register 5 
Auxiliary register 6 
Auxiliary register 7 


Data-page pointer 
Index register 0 
Index register 1 
Block-size register 
System stack pointer 


Status register 
DMA Coprocessor interrupt enable 
Internal-interrupt enable register 

ll1OF pins and interrupt flag register 


Repeat start address 
Repeat end address 
Repeat counter 
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EXAMPLE Example Instruction 


Syntax 


Operation 


Operands 
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INST src, dst 
or 


INST1 src2, dst1 
|| INST2 src3, dst2 


Each instruction begins with an assembler syntax expression. Labels may 
be placed either before the command (instruction mnemonic) on the same 
line or on the preceding line in the first column. The optional comment field 
that concludes the syntax is not included in the syntax expression. 
Space(s) are required between each field (label, command, operand, and 
comment fields). 


The syntax examples illustrate the common one-line syntax and the two-line 
syntax used in parallel addressing. Note that the two vertical bars || that indi- 
cate a parallel addressing pair can be placed anywhere before the mnemon- 
ic on the second line. The first instruction in the pair can have a label, but 
the second instruction cannot have a label. 


|src | — dst 
or 


\src2 | — dst? 
| src3 — dsi2 


The instruction operation sequence describes the processing that takes 
place when the instruction is executed. For parallel instructions, the opera- 
tion sequence is performed in parallel. Conditional effects of status register 
specified modes are listed for conditional instructions such as Bcond. 


src general addressing modes (G): 
00 _ register (any register in CPU primary register file) 
01 direct 
10 — indirect 
11 immediate 


dst register (any register in CPU primary register file) 
or 

Src2__ indirect (disp = 0, 1, IRO, IR1) 

dst? register (RO — R7) 


src3__ register (RO — R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


Operands are defined according to the addressing mode and/or the type 
of addressing used. Note that indirect addressing uses displacements and 
the index registers. Refer to Chapter 5 for detailed information on address- 
ing. 


Assembly Language Instructions 


Example Instruction EXAMPLE 
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ere 
24 23 87 
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or 


31 24 23 16 15 87 


Encoding examples are shown for general addressing and parallel address- 
ing. The instruction pair for the parallel addressing example consists of 
INST1 and INST2. 


Description \|nstruction execution and its effect on the rest of the processor or memory 
contents are described. Any constraints on the operands imposed by the 
processor or the assembler are discussed. The description parallels and 
supplements the information given by the operation block. 


Cycles 1 


0 


oO 


The digit specifies the number of cycles required to execute the instruction. 


Status Bits LUF Latched Floating-Point Underflow Condition Flag. 1 if a float- 
ing-point underflow occurs, unchanged otherwise. 


LV __Latched Overflow Condition Flag. 1 if an integer or floating-point 
overflow occurs, unchanged otherwise. 


UF Floating-Point Underflow Condition Flag. 1 if a floating-point un- 
derflow occurs, 0 otherwise. 


N Negative Condition Flag. 1 if a negative result is generated, 0 
otherwise. In some instructions, this flag is the MSB of the output. 


Z Zero Condition Flag. 1 if a zero result is generated, 0 otherwise. For 
logical and shift instructions, 1 if a zero output is generated, 0 other- 
wise. 


V Overflow Condition Flag. 1 if an integer or floating-point overflow 
occurs, 0 otherwise. 


C Carry Flag. 1 if a carry or borrow occurs, 0 otherwise. For shift in- 
structions, this flag is set to the value of the last bit shifted out; 0 for : 
a shift count of 0. 


The seven condition flags are stored in the status register (ST). They pro- 
vide information about the properties of the result or output of arithmetic or 
logical operations. 
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Mode Bit OVM Overflow Mode Flag. In general, integer operations are affected 


Example 
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by the OVM bit value (described in Table 3-2 on page 3-6). 
INST @98AEh,R5 
Before Instruction: 


DP = 80h 

R5 = 07 6690 0000h = 2.30562500e+02 

Memory at 0080 98AEh = 5CDFh = 1.00001107e + 00 
LUF LV UF NZVC#000 00 00 


DP = 80h 


R5 = 00 6690 0000h = 1.80126953e + 00 
Memory at 80 98AEh = 5CDFh = 1.00001107e + 00 
LUF LV UF NZVC2#0000 0 00 


The sample code presented in the above format shows the effect of the code 

on system pointers (e.g., DP or SP), registers (e.g., R1 or R5), memory at 

specific locations, and the seven status bits. The values given for the regis- 

ters include the leading zeros to show the exponent in floating-point opera- 

tions. Decimal conversions are provided for all register and memory loca- 

tions. The seven status bits are listed in the order in which they appear in 

the assembler and simulator (see Section 11.2 on page 11-10 and 

Table 11-8 on page 11-12 for further information on these seven status: 
bits). 
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Syntax ABSF src, dst 

Operation |src| > dst 

Operands src general addressing modes (G): 
00 register (RO — R11). 
01 direct 


10 indirect 
11 immediate. 


dst register (RO — R11) 


Encoding 
87 


31 24 23 16 15 


Description The absolute value of the src operand is loaded into the dst register. The 
src and dst operands are assumed to be floating-point numbers. 


Anoverflow occurs if src (man) = 8000 0000h and src(exp) = 7Fh. The result 
is dst (man) = 7FFF FFFFh and dst (exp) = 7Fh. 7 


Cycles 1 


Status Bits LUF Unaffected. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 0. 
NO. 
Z  1ifazero result is generated, 0 otherwise. 
V1 if a floating-point overflow occurs, 0 otherwise. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example ABSF R4,R7 
| Before Instruction: 


R4 = 05C8000 F971h = —9.90337307e + 27 
R7 = 07D2511 OOAEh = 5.48527255e + 37 
LUF LV UF NZVC=000 0000 


After Instruction: 


R4 = 05C8000 F971h = —9.90337307e + 27 
R7 = O5C7FFF O68Fh = 9.90337307e + 27 
LUF LV UF NZVC=00 00 0 0 0 
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ABSF||STF Parallel ABSF and STF 


SOSSSIRESSSS SISSIES ETS SSSI NATE ETS TNE ANS SS SSS SS ST Sa 


Syntax 


Operation 


Operands 


Encoding 
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ABSF src2, dst1 
|| STF src3, dsi2 


|src2 | > dst? 
| src3— dst2 


src2 indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO —R7) 
src3_ register (RO — R7) 
dsi2 indirect (disp = 0, 1, IRO, IR1) 


87 O- 


31 24 23 16 15 
RIOR 9 Pe 


Description A floating-point absolute value and a floating-point store are performed in 


‘Cycles 
Status Bits 


Mode Bit 


11-22 


parallel. All registers are read at the beginning and loaded at the end of the 
execute cycle. This means that if one of the parallel operations (STF) reads 
froma a and the operation being performed in parallel (ABSF) writes 
to the same register, then STF accepts as input the contents of the register 


before it is modified by the ABSF. 


If src2 and dsi2 point to the same location, src2 is read before the write to 
dsi2. lf src3 and dst1 point to the same register, src3 is read before the write 
to dst?7. | 


An overflow occurs if src(man) = 80000000h and src(exp) = 7Fh. The result — 
is dst (man) = 7FFFFFFFh and dst (exp) = 7Fh. 


1 


LUF Unaffected. 

LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 0. 

N 0. | 

Z _ 1ifa zero result is generated, 0 otherwise. 

V1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
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Example 


Parallel ABSF and STF ABSF||STF 
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ABSF *++AR3(IR1) ,R4 
STF R4,*- AR7 (1) 


Before Instruction: 


AR3 = 80 9800h 

IR1 = OAFh 

R4 = 733C0 0000h = 1.79750e + 02 

AR7 = 80 98C5h 

Data at 80 98AFh = 58B 4000h = — 6.118750e + 01 
Data at 80 98C4h = Oh 

LUF LV UF N ZV Cz=#0 0000 0 0 


After Instruction: 


AR3 = 80 98AFh 

IR1 = OAFh 

R4 = 574C0 0000h = 6.118750e + 01 

AR7 = 80 98C5h 

Data at 80 98AFh = 58B 4000h = -6.118750e + 01 
Data at 80 98C4h = 733 CO00h = 1.79750e + 02 
LUF LV UF NZVC=0 000000 
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Syntax ABSI src, dst 


Operation |src| — dst 


Operands _ src general addressing modes (G): 
00 register (any register in CPU primary register file) 
01 direct | 
10 indirect 
11 immediate 


ast register (any register in CPU primary register file) 


Encoding 
31 15 


| 24 23 16 87 0 
ooefew oreo] or E 
Description The absolute value of the src operand is loaded into the dst register. The 
src and dst operands are assumed to be signed integers. 


An overflow occurs if src = 8000 0000h. If ST(OVM) = 1, the result is 
dst = 7FFF FFFFh. If ST(OVM) = 0, the result is dst = 8000 O000h. © 


Cycles 1 


Status Bits \f ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. | 
LUF Unaffected. 

LV 1 if aninteger overflow occurs, unchanged otherwise. 
UF 0. 

N 0. | 

Z  1ifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 


Mode Bit OVM Operation is affected by OVM bit value. 
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Example 1 


Example 2 


ABSI 


Absolute Value of Integer 
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ABSI RO,RO 
OrABSI RO 


Before Instruction: 

RO = OFFFF FFCBh = — 53 
After Instru ction: 

RO = 035h = 53 

ABSI *AR1,R3 

Before Instruction: 

AR1 = 20h 


R3 = Oh 
Data at 20h = OFFFF FFCBh =— 53 


After Instruction: 
AR1 = 20h 


R3 = 35h = 53 
Data at 20h = OFFFF FFCBh =— 53 
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ABSI| [STI Parallel ABS] and STI 


Syntax 


Operation 


Operands 


oo 
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ABSI_src2, dst! 
| STI src3, dst2 


|src2 | —- dst? 
|| src3— dst2 


Src2__ indirect (disp = 0, 1, IRO, IR1) 
dst1 register (RO — R7) 
src3__ register (RO — R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


87 | 0 


Description An integer absolute value and an integer store are performed in parallel. All 


Cycles 
Status Bits 


Mode Bit 
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registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STI) reads from a 


register and the operation being performed in parallel (ABSI) writes to the 


same register, then STI accepts as input the contents of the register before 
it is modified by the ABSI. 


If src2 and dsi2 point to the same location, src2 is fa before the write to 
dsi2. 


An overflow occurs if src = 8000 0000h. If ST(OVM) = 1, the result is dst = 
7FFF FFFFh. If ST(OVM) = 0, the result is dst= 8000 0000h. 


1 


LUF Unaffected. 

LV 1 if aninteger overflow occurs, unchanged otherwise. 
UF 0. 

N 0. 

Z 1 éifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is affected by OVM bit value. 
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Parallel ABSI and STI ABSIi||STI 


Example ABSI *-AR5(1),R5 
|| STIR1,*AR2- -(IR1) 


Before Instruction: 


AR5 = 80 99E2h 

R5 =0Oh 

Ri = 42h = 66 

AR2 = 80 98FFh 

IR1 = OFh 

Data at 80 99E1h = OFFFF FFCBh = — 53 

Data at 80 98FFh = 2h=2 

LUF LV UF N ZV C2=#0 0 000 0 0 


After Instruction: 


AR5 = 80 99E2h 

R5 = 35h = 53 

Ri = 42h = 66 
AR2 = 80 98F0h 

IR1 = OFh 

Data at 80 99E1h = OFFFF FFCBh =— 53 

Data at 80 98FFh = 42h = 66 

LUF LV UF N ZV CeH#0 0 00000 
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ADDC Add Integer With Carry | 


Syntax 
Operation 


Operands 


Encoding 


ADDC src, dst 
dst + src+C — dst 


Src general addressing modes (G): 
00 _ register (any register in CPU primary register file) 
01 direct 
10 — indirect 
11 immediate 


dst register (any register in CPU primary register file) 


24 23 16 


31 15 S7 


Description The sum of the dst and src operands and the C (carry) flag is loaded into 


Cycles 
Status Bits 


Mode Bit 


Example 
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the dst register. The dst and src operands are assumed to be signed inte- 
gers. 


1 


lf ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (GET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV .1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if anegative result is generated, 0 otherwise. 

Z 1 ifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C 1 if acarry occurs, 0 otherwise. 


OVM Operation is affected by OVM bit value. 
ADDC R1,R5 
Before Instruction: 


R1 = OOFFFF 5C25h = — 41,947 
R5 = OOFFFF 019Eh =- 65,122 
LUF LV UE NZVC=0 000000 


After Instruction: 


R1 = OOFFFF 5C25h = — 41,947 
RS = OOFFFE 5DC4h = — 107,068 
LUF LV UF NZV C=0 000000 
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Syntax ADDC3 - src2, srci, dst 
Operation = src1 + src2+C — dst 
Operands srci,src2 _ both type 1 or type 2 three-operand addressing modes 


‘ast register mode (any register in CPU primary register file) 
Encoding 
Type 1 
31 24 23 16 15 87 0 


register mode (any CPU register) 
indirect mode (disp = 0, 1, IRO, IR1) 


indirect mode (disp = 0, 1, IRO, IR1) 


indirect mode *+ARn1 
displacement) 
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ADDC3 Add Integer With Carry, 3 Operands 
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Description The sum of the src? and src2 operands and value of the C (carry) flag is 
loaded into the dst register. The src7, src2, and dst operands are assumed 
to be signed integers. 


Cycles 1 


Status Bits |f ST(SET COND) =0, the condition flags are modified if the destination reg- 
ister is RO—R11. If ST (SET COND) = 1, they are modified for all destination 
registers. 

LUF Unaffected. 
LV 1 if aninteger overflow occurs, unchanged otherwise. 

0. 

1 if a negative result is generated, 0 otherwise. 

1 if a zero result is generated, 0 otherwise. 

1 if an integer overflow occurs, 0 otherwise. 

1 if acarry occurs, 0 otherwise. 


Mode Bit + OVM Operation is affected by OVM bit value. 


QO<NZC 
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Syntax 
Operation 


Operands 


Encoding 
31 


24 23 16 
Pe Ea ae : 


ADDF src, ast 


dst + src > dst 


src general addressing modes (G): 
00 register (RO — R11) 
01 direct 
10 indirect 
11 immediate 


dst register (RO — R11) 


15 87 0 


Description The sum of the dstand src operands is loaded into the dst register. The dst 


Cycles 
Status Bits 


Mode Bit 


Example 


and src operands are assumed to be floating-point numbers. 
1 


LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF (1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z _1ifazero result is generated, 0 otherwise. 

V1 if an floating-point overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 


ADDF *AR4++(IR1),R5 


Before Instruction: 


AR4 = 80 9800h 

IR1 = 12Bh 

R5 = 057980 0000h = 6.23750e+01 

Data at 80 9800h = 86B 2800h = 4.7031250e + 02 
LUF LV UF N ZV C=0 000000 


After Instruction: 


AR4 = 80 992Bh 

IR1 = 12Bh 

R5 = 09052C 0000h = 5.3268750e+02 

Data at 80 9800h = 86B 2800h = 4.7031250e + 02 
LUF LV UF NZ VC=0 000000 
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ADDF3 Add Floating-Point Values, 3 Operands 
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Syntax ADDF3 _ src2, src1, dst 
Operation src + src2 > dst 


Operands srci,src2 both type 1 or type 2 three-operand addressing modes 


dst register mode (RO — R11) 
Encoding 
Type 1 
31 24 23 16 15 87 0. 
Type 2 
31 24 23 16 15 87 0 


Instruction Word Fields 


src addressing modes src2 addressing modes | 


[00 [register mode (RO-Fi1) [register mode (RO-RT) 
Type 
: 


src1 addressing modes src2 addressing modes 


register mode (any CPU register) tel el +ARn(5-bit unsigned 
44 | indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) | displacement) 


Type 2 
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Description The sum of the src? and src2 operands is loaded into the dst register. The 
src1, sfc2, and dst operands are assumed to be floating-point numbers. 


1 


Cycles 
Status Bits 


Mode Bit 


Example 


LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF $1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z  14ifazero result is generated, 0 otherwise. 

V1 if an floating-point overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
ADDF3 *AR1(2),*+AR1(8),R4 
Before Instruction: 


AR1 = 2FF820h 

R4 = 0h 

Data at 22F F822h = 700 FOOOh = 1.28940e + 02 
Data at 22F F828h = 34C 2000h = 1.27590e + 01 
LUF LV UF NZVC=#0 000000 


After Instruction: 


AR1 = 2F F820h 

R4 = 070DB2 0000h = 1.41695313 e + 02 

Data at 22F F828h = 34C 2000h = 1.27590e + 01 
LUF LV UF NZVC=0 0000 00 
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ADDF3| [STF Parallel ADDFS and STF 
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Syntax ADDFS3 _ src2, src1, dst? 
|| STF src3, dst2 


Operation src1 + src2 = dst1 
|| src3 + dst2 


Operands src1_ register (RO — R7) 
src2__ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src3_ register (RO — R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


Encoding 

31 24 23 16 15 87 0 

Description A floating-point addition and a floating-point store are performed in parallel. 
All registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STF) reads from a 
register and the operation being performed in parallel (ADDF3) writes to the 


same register, then STF accepts as input the contents of the register before 
it is modified by the ADDF3. 


If src2 and dst2 point to the same location, src2 is read before the write to 
dsi2. 


Cycles | 


Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF (1 if a floating-point underflow occurs, 0 otherwise. 
1 if a negative result is generated, 0 otherwise. 
1 if a zero result is generated, 0 otherwise. 
1 if an floating-point overflow occurs, 0 otherwise. 
Unaffected. 


O<N2Z 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-34 Assembly Language Instructions 


Parallel ADDF3 and STF ADDF3\||STF 


OT RT TRIS NOSIS RASA SIR AA a A as A A A RAR AAA Anan aa aaa aaa ae nace aaa ate Aaa aaa aa Aaa aaa aa aaa alee 


Example ADDF3 *+AR3(IR1),R2,R5 
|| STF R4,*AR2 


Before Instruction: 


AR3 = 809800h 

IR1 =0OA5h 

R2 = 070C80 0000h = 1.4050e + 02 

R5 =0h 

R4 = 057B40 0000h = 6.281250e + 01 

AR2 = 80 98F3h 

Data at 80 98A5h = 733 CO00h = 1.79750e + 02 
Data at 80 98F3h=Oh | 

LUF LV UF N ZV C=0 0000 0 0 


Instruction: 


AR3 = 80 9800h 

IR1 = OA5h 

R2 = 070C80 0000h = 1.4050e+02 

R5 = 082020 0000h = 3.20250e + 02 

R4 = 057B40 0000h = 6.281250e + 01 

AR2 = 80 98F3h 

Data at 80 98A5h = 733 C000h = 1.79750e + 02 
Data at 80 98F3h = 57B 4000h = 6.28125e + 01 
LUF LV UF NZV C=0 0000 00 
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ADDI Add Integer 
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Syntax ADDI src, dst 
Operation dst + src— dst 


Operands src general addressing modes (G): 
00 register (any register in CPU primary register file) 
01 direct | 
10 indirect 
11 immediate 


dst register (any register in CPU primary register file) 


Encoding : 
31 24 23 16 15 87 0 


Description The sum of the dst and srcoperands is loaded into the the dstregister. The 
dst and src operands are assumed to be signed integers. 


Status Bits _\f ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. | 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z  1ifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C 1 if acarry occurs, 0 otherwise. 


Mode Bit OVM Operation is affected by OVM bit value. 
Example ADDI R3,R7 
Before Instruction: 


R3 = OFFFF FFCBh =— 53 

R7 = 35h = 53 

LUF LV UF N ZV Cz=0 0000 00 
After Instruction: 

R3 = OFFFF FFCBh =— 53 

R7 =0Oh 7 

LUF LV UF N ZV C#0 000000 
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Syntax ADDI3 src2, src1, dst 
Operation src1 + src2 > dst 


Operands src1,src2  bothtype 1 or type 2 three-operand addressing modes 


‘dst register mode (any register in CPU primary register file) 
Encoding 
Type 1 
31 24 23 16 15 87 0 
Type 2 
31 24 23 1615 | 87 0 


Instruction Word Fields 


| src! addressing modes src2 addressing modes 


00 | register mode (any CPU register) register mode (any CPU register) 
indirect mode (disp = 0, 1, IRO, IR1) register mode (any CPU register) 
register mode (any CPU register) indirect mode (disp = 0, 1, IRO, IR1) 


11 


Type 1 


( = 
indirect mode (disp = 0, 1, IRO, IR1) 


indirect mode (disp = 0, 1, IRO, IR1) 


src2 addressing modes 
register mode (any CPU register) 8-bit signed immediate 


, 
; indirect mode *+ARn(5-bit unsigned 
register mode (any CPU register) displacement) 
indirect mode *+ARn(5-bit unsigned aa ake pret 
displacement) | 8-bit signed immediate 
14 indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) | displacement) 


src? addressing modes 


Type 2 


11-37 


ADDIS Add integer, 3 Opera 


nds 


tetasetatatetates cee 


Description The sum of the src? and src2 operands is loaded into the dst register.The 


Cycles 
Status Bits 
Mode Bit 
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src1, src2, and dst operands are assumed to be signed integers. 
1 


lf ST (SET COND) = 0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if anegative result is generated, 0 otherwise. 

Z 1 éifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C 1 if acarry occurs, 0 otherwise. 


OVM Operation is affected by OVM bit value. 


Assembly Language Instructions 


Parallel ADDI3 and ST|_ ~ADDI3\||STI 


Pee saan ORES EHaNS Hee iesete eit eleletesneesesaseassosesesosnsosesesnswsnensesnsosbeanbtenetOtonetetelOnOnnensesesonenntctsecstesetstasesesesasesosasasoessusenetetosasosateteteronesateronerosonsnsrosetonstesstesendsatatesutetatutuintssetesetssetelstensnseisanntetsessintatecnnn menses 


Syntax ADDI3 src2, src1, dst? 
|| STI src3, dst2 
Operation src! + src2 — dst1 


|| src3 — dsi2 


Operands src1_ register (RO — R7) 
src2_ indirect (disp = 0, 1, IRO, IR1) 
dst! register (RO — R7) 
src3__register (RO — R7) 
dst2_ indirect (disp = 0, 1, IRO, IR1) 


Encoding 
31 7 


24 23 16 15 87 
Bc ee ed 


Description An integer addition and an integer store are performed in parallel. All regis- 
ters are read at the beginning and loaded at the end of the execute cycle. 
This means that if one of the parallel operations (STI) reads from a register 
and the operation being performed in parallel (ADDI3) writes to the same 
register, then STI accepts as input the contents of the register before it is 
modified by the ADDIS. 


If src2 and dsit2 point to the same location, src2 is read before the write to 
dst2. 


Cycles 1 


Status Bits LUF Unaffected. 
LV 1 if aninteger overflow occurs, unchanged otherwise. 
UF 0. 
N 1 if anegative result is generated, 0 otherwise. 
Z _1ifazero result is generated, 0 otherwise. 
V___1if an integer overflow occurs, 0 otherwise. 
C 1 if acarry occurs, 0 otherwise. 


Mode Bit (OVM Operation is affected by OVM bit value. 
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Example ADDI3 *ARO- —(IRO),R5,RO 
|| sTz R3,*ART7 


Before Instruction: 


ARO = 80 992Ch 

IRO = O0Ch 

R5 = ODCh = 220 

RO =0h 

R3 = 35h = 53 

AR7 = 80 983Bh 

Data at 80 992Ch = 12Ch = 300 

Data at 80 983Bh = 0h 

LUF LV UF NZVC2#0 000000 


After Instruction: 


ARO = 80 9920h 

IRO = 0Ch 

R5 = ODCh = 220 

RO = 208h = 520 

R3 = 35h =53 

AR7 = 80 983Bh 

Data at 80 992Ch = 12Ch = 300 

Data at 80 983Bh = 35h = 53 

LUFLV UENZVC=0 000000 
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SORES ROR EOT T TTTTTR ROSSI ICOT RES TESTI TIBI ISO TIS OTITIS STIS RS STII OLLI EDITOR - 


Syntax 
Operands 
Operands 


Encoding 


31 24 23 16 15 


Bitwise Logical-AND AND 


si ssnnnncanstaanatasr tesco ct 


AND src, dst 
dst AND src > dst 


sre general addressing modes (G): 
00 _ register (any register in CPU primary register file) 
01 direct 
10 _ indirect 
11 immediate (not sign-extended) 


dst register (any register in CPU primary register file) 


87 0 


Description The bitwise logical-AND between the dst and src operands is loaded into 


Cycles 
Status Bits 


Mode Bit 


Example 


the dstregister. The dstand srcoperands are assumed to be unsigned inte- 
gers. 


1 


lf ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET a = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N MSB of the output. 

Z  1ifazero result is generated, 0 otherwise. 

V0. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
AND  R1,R2 
Before Instruction: 


R1 = 80h 
R2 = OAFFh 
LUF LV UF NZV C=0 0 00 0 0 1 


After Instruction: 
R1 = 80h 


R2 = 80h | 
LUF LV UF NZV,C=0 0 00 0 0 1 
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AND3 _Bitwise Logical-AND, 3 Operands 


na ne ae LAL Ne 


Syntax AND3 src2, src1, dst 
Operation src1 & src2 > ast 


Operands srci,src2 _bothtype 1 or type 2 three-operand addressing modes 


dst register mode (any register in CPU primary register file) 
Encoding 
Type 1 
31 24 23 16 15 87 0 
Type 2 
31 24 23 1615 87 0 


Instruction Word Fields 


00) register mode (any CPU register) register mode (any CPU register) 
Type 1 indirect mode (disp = 0, 1, IRO, IR1) register mode (any CPU register) 


register mode (any CPU register) — indirect mode (disp = 0, 1, IRO, IR1) 


indirect mode (disp = 0, 1, IRO, IR1) indirect mode (disp = 0, 1, IRO, IR1) 


src1 addressing modes src2 addressing modes 
00 | register mode (any CPU register) 8-bit signed immediate | 


register mode (any CPU register) i ishatealioees -ARN(S “bit Unsigned 


indirect mode *+ARn(5-bit unsigned Seis 
displacement) i ( g 8-bit signed immediate 


44 indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) displacement) 
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Bitwise Logical-AND with Complement AND3 


APLAR e BPP PENA GANNON GR APSA CIENCIA ING 


Description The bitwise logical-AND between the src? and src2 operands is loaded into 
the dst register. The src7, src2, and dst operands are assumed to be un- 
signed integers. 


Cycles 1 


Status Bits \f ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N  MSB of the output. 

Z 1 ifazero result is generated, 0 otherwise. 
VO. 

C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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Syntax 


Operation 


Operands 


Encoding 


AND src2, src1, dst1 
|| STI sre3, dst2 


src? AND src2 — dst1 
|| src3 — dst2 


src? register (RO — R7) 
src2__ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src3_ register (RO — R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


615 87 0 


31 24 23 1 


Description Abitwise logical-AND and an integer store are performed in parallel. All reg- 


Cycles 
Status Bits 


Mode Bit 
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isters are read at the beginning.and loaded at the end of the execute cycle. 
This means that if one of the parallel operations (STI) reads from a register 
and the operation being performed in parallel (AND3) writes to the same 
register, then STI accepts as input the contents of the register before it is 
modified by the ANDS. 


lf src2 and dst2 point to the same location, src2 is read before the write to 
dst2. | 


1 


LUF Unaffected. 

LV Unaffected. 

UF 0. 

N MSB of the output. 

Z 1 if azero result is generated, 0 otherwise. 
VO. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


Example 


AND3 *+AR1(IRO),R4,R7 
|| STI R3,*AR2 


Before Instruction: 


AR1 = 8099F 1h 

IRO = 8h 

R4 = 0OA323h 

R7 = Oh 

R3 = 35h = 53 

AR2 = 80 983Fh 

Data at 80 99F9h = 5C53h 
Data at 80 983Fh = Oh 


LUF LV UF NZVC=0 0000 0 0 


After In ion: 


AR1 = 80 99F ih 

RO = 8h 

R4 = 0A323h 

R7 = 03h 

R3 = 35h = 53 

AR2 = 80 983Fh 

Data at 80 99F9h = 5C53h 
Data at 80 983Fh = 35h = 53 


LUF LV UF NZ VC=0 00000 0 
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ANDN Bitwise Logical-AND With Complement 


ao ee ea aT ee CS suuenenageabebgbuteeeesaeatsebeaanatsabanscatabebesssueseecscenseescaeueseonhusueceetetennenetmatieeteteasiteseeceitetesantegetatestes . 


raelotelerete%erecereces 


Syntax ANDN src, dst 
Operation dst AND ~src > dst 


Operands src general addressing modes (G): 
00 register (any register in CPU primary register file) 
01 direct | 
10 — indirect 
11 immediate (not sign-extended) 


dst register (any register in CPU primary register file) 


Encoding 
24 23 16 15 87 , 0 


31 | 
jooofo00110| a dst sre 


Description The bitwise logical-AND between the dst operand and the bitwise logical 
complement (~) of the src operand is loaded into the dst register. The dst 
and src operands are assumed to be unsigned integers. 


Cycles 1 


Status Bits \f ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N MSB of the output. 

Z  1éifazero result is generated, 0 otherwise. 
V 0. 

C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example ANDN @980Ch, R2 

Before Instruction: 

DP = 80h 

R2 = 0OC2Fh 


Data at 80 980Ch = OA02h 
LUF LV UF NZ VC=#0 000000 


After Instruction: 


DP = 80h 

R2 = 042Dh | 

Data at 80 980Ch = 0A02h 

LUF LV UENZVC=0 000000 
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Syntax ANDN3 src2, src1, dst 
Operation src1 AND ~ src2 —> dst 


Operands src1,src2  bothtype 1 or type 2 three-operand addressing modes 


dst register mode (any register in CPU primary register file) 
Encoding 
Type 1 
31 24 23 16 15 87 0 
Type 2 
31 24 23 16 15 87 0 


10010 0 


fo) 
© 
awd, 
a 
: 
C) 
wali, 
% 
C) 
X 


Instruction Word Fields 


src1 addressing modes _— 
register mode (any CPU register) register mode (any CPU register) 
indirect mode (disp = 0, 1, IRO, IR1) register mode (any CPU register) _ 


2|8| 4 


Type 1 


10 | register mode (any CPU register) indirect mode (disp = 0, 1, IRO, IR1) 
| indirect mode (disp = 0, 1, IRO, IR1) 


11 | indirect mode (disp = 0, 1, IRO, IR1) 


src1 addressing modes src2 addressing modes 
register mode (any CPU register) 8-bit signed immediate 


01 | register mode (any CPU register) ate and Tere eH Unsigned | 


Type 2 


indirect mode *+ARn(5-bit unsigned 

displacement) | 8-bit signed immediate 

indirect mode *+ARn1(5-bit unsigned _|{ indirect mode *+ARn2(5-bit unsigned 
displacement) displacement) 
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ANDNS _ Bitwise Logical-ANDN, ci casts 


Pee PEON The bitwise logical-AND between the src? operand and the bitwise logical 


Cycles 
Status Bits 


Mode Bit 
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complement (~) of the src2 operand is loaded into the dstregister. The src7, 
src2, and dst operands are assumed to be unsigned integers. 


1 


LUF Unaffected. 

LV Unaffected. 

UF 0. 

N MSB of the output. 


Z 1 ifazero result is generated, 0 otherwise. 
VO. 


C Unaffected. 
OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


_Arithmetic Shift ASH 
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Syntax ASH count, dst 


Operation _|f (count 2 0): 
dst << count > dst 
Else: 
dst >> |count | — dst 


Operands count general addressing modes (G): 
0 0 register (any register in CPU primary register file) 
0 1 direct 
1 0 indirect | 
1 1 immediate 


dst register (any register in CPU primary register file) 


ae 
24 23 15 87 


0 


Description The seven least-significant bits of the count operand are used to generate 
the twos-complement shift count of up to 32 bits. 


If the count operand is greater than zero, the dst operand is left-shifted by 
the value of the count operand. Low-order bits shifted in are zero-filled, and 
high-order bits are shifted out through the C (carry) bit. 


Arithmetic left-shift: 


Cefn 


lf the countoperand is less than zero, the dst operand is right-shifted by the 
absolute value of the count operand. The high-order bits of the dst operand 
are sign-extended as it is right-shifted. Low-order bits are shifted out 
through the C (carry) bit. 


Arithmetic right-shift: 


Portela bf] 


lf the count operand is zero, no shift is performed, and the C (carry) bit is 
set to 0. The count and dst operands are assumed to be signed integers. 
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ASH Arithmetic Shift 


Cycles 
Status Bits 


Mode Bit 
Example 1 


Example 2 
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1 
lf ST (SET COND) =0 and the destination register is RO—R11, the condition 


flags are modified. If ST (SET COND) = 1, they are modified for all destina- 


tion registers. 

LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N MSB of the output. 

Z 1 ifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C Set to the value of the last bit shifted out. 0 for a shift count of 0. 


OVM Operation is not affected by OVM bit value. 
ASH R1,R3 


Before Instruction: 


Ri =10h=16 
R3 = 0A E000h 
LUF LV UF NZV C=0 000000 


After Instruction: 


R1=10h 
R3 = 0E0000 0000h 
LUF LV UF NZVC=0 101010 


ASH @98C3h,R5 


Before Instruction: 


DP = 80h | 

R5 = OAECO 0001h 

Data at 80 98C3h = OFFE8 = — 24 

LUF LV UF NZVC=#0 0000 0 0 


After Instruction: 


DP = 80h 

R5 = OFFFF FFAEh 

Data at 80 98C3h = OFFE8 = — 24 

LUF LV UF NZ VCz#0 0010 0 1 


Assembly Language Instructions 


Syntax 


Operation 


Operands 


Encoding 


Arithmetic Shift, 3 Operands ASH3 


a KES RES SG SSS CU ee 


ASH3 count, src, dst 


if (count = 0) 

src<< count — dst 
Else: 

src >> | count| > dst 


src, count — both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 
24 23 16 15 87 0 


src2 addressing modes 
register mode (any CPU register) 


register mode (any CPU register) _ 


register mode (any CPU register) indirect mode (disp = 0, 1, IRO, IR1) 
_ | indirect mode (disp = 0, 1, IRO, IR1) 


src1 addressing modes src2 addressing modes 
register mode (any CPU register) 8-bit signed immediate 


register mode (any CPU register) a hg +ARn(5-bit unsigned 
indirect mode *+ARn(5-bit unsigned ee : , 
displacement) 8-bit signed immediate 
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Description The seven least-significant bits of the count operand are used to generate 


Cycles 
Status Bits 


Mode Bit 
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‘the twos-complement shift count. 


lf the count operand is greater than zero, the src operand is left-shifted by 
the value of count. Low-order bits shifted in are zero-filled, and high-order 
bits are shifted out through the status register’s C (carry) bit. 


‘Arithmetic left-shift: 


Te ea oe 


lf the count operand is less than zero, the srcoperand is right-shifted by the 
absolute value of count (e.g. —-4 = right-shift 4. The high-order bits of the 
src operand are sign-extended as they are right-shifted. Low-order bits are 
shifted out through the C (carry) bit. 


Arithmetic right-shift: 


MSB src 0 


If the count operand is zero, no shift is performed, and the C (carry) bit is 


set to 0. The count, src, and dst operands are assumed to be signed inte- 
gers. 


1 


LUF Unaffected. 


LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 


N  MSB of the output, 

Z 1 ifa zero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C Set to the value of the last bit shifted out. 0 for a shift count of 0. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


Parallel ASH3 and ST! ASHS| [STI 


Syntax ASH3 count, src2, dst1 
|| STi src3, dst2 


Operation _|f (count2 0): 
src2 << count > dst1 
Else: 
Src2 >> |count| > dsti 
| src3— dst2 . 
Operands count register (RO — R7) 
src2__ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src3 register (RO — R7) 
dsi2 indirect (disp = 0, 1, IRO, IR1) 
a | 


87 
Be evn 


Description The seven least-significant bits of the count operand register are used to 
generate the twos-complement shift count of up to 32 bits. 


If the count operand is greater than zero, the dst operand is left-shifted by 
the value of the count operand. Low-order bits shifted in are zero-filled, and 
high-order bits are shifted out ore the C am) bit. 


Arithmetic left-shift: 


[e}<{__soe ec 


If the count operand is less than zero, the dst operand is right-shifted by the 
absolute value of the count operand. The high-order bits of the dst operand 
are sign-extended as it is right-shifted. Low-order bits are shifted out 
through the € (carry) bit. | 


Arithmetic right-shift: 


ee of fi 
i a Kl 


If the count operand is zero, no shift is performed, and the C (carry) bit is 3 
‘set to 0. The count and dst operands are assumed to be signed integers. 1 
| | | 11 
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ASH3| [STI Parallel ASH3 and STI 


BS SS RA OCS OR SNS SS Ce 


All registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STI) reads from a 
register and the operation being performed in parallel (ASH3) writes to the 
same register, then STI accepts as input the contents of the register before 
itis modified by the ASH3. If src2 and dst2 point to the same location, src2 
is read before the write to dsi2. 


Cycles 1 


Status Bits LUF Unaffected. 
LV 1 if aninteger overflow occurs, unchanged otherwise. 
UF 0. 
N  MSB of the output. 
Z 1 ifazero result is generated, 0 otherwise. 
V___.1 if an integer overflow occurs, 0 otherwise. 
C Set to the value of the last bit shifted out. 0 for a shift count of 0. 


Mode Bit OVM Operation is not affected by OVM bit value. 


Example ASH3 R1, *AR6++(IR1) ,RO 
I STI R5,*AR2 


Before Instruction: 


AR6 = 80 9900h 

IR1 = 8Ch 

Ri = OFFE8h = — 24 

RO = Oh 

R5 = 35h = 53 

AR2 = 80 98A2h 

Data at 80 9900h = OAE00 0000h 

Data at 80 98A2h = Oh 

LUF LV UF NZ V C#0 000000 


A er In ruction: 


AR6 = 80 998Ch 
R1 = OFFE8h = — 24 
RO = OFFFF FFAEh 
R5 = 35h = 53 
AR2 = 80 98A2h 
Data at 80 9900h = OAE00 0000h 
_ Data at 80 98A2h = 35h = 53 
LUF LV UF NZV C=#=0 001000 


11-54 Assembly Language Instructions 


Branch Conditionally (Standard) Bcond 


Syntax 
Operation 


Operands 


Encoding 


Bcond src 


If cond is true: 
If src is in register addressing mode (any register in CPU primary 
register file), 
src — PC. 
lf src is in PC-relative mode (label or address), 
displacement + PC + 1 — PC. 
Else, continue. 


sre conditional-branch addressing modes (B): 
0 register 
1 PC-relative 


31 24 23 16 15 87 0 
011 010 ja}o 0 of o| cond register or displacement 


Description Bcond signifies a standard branch that executes in four cycles. A branch is 


Cycles 
Status Bits 


Mode Bit 


performed if the condition is true (since a pipeline flush also occurs on a true 
condition; see Section 10.2 on page 10-4). If the src operand is expressed 
in register addressing mode, the contents of the specified register are 
loaded into the PC. If the srcoperand is expressed in PC-relative mode, the 
assembler generates a displacement: displacement = label —(PC of branch 
instruction + 1). This displacement is stored as a 16-bit signed integer in the 
16 least significant bits of the branch instruction word. This displacement is 
added to the PC of the branch instruction plus 1 to generate the new PC. 


The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 11.2 on page11-10 for a list of condition mnemonics, 
encoding, and flags). 


1 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
VV ~Unaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 
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Bcond Branch Conditionally (Standard, 


Example 
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BZ RO 
Before Instruction: 


PC = 2B00h 
RO = 0003 FFOOh 
LUF LV UF NZV C=0 0 00 0 0 0 


After Instruc tion: 


PC = 3FFOOh 
RO = 0003 FFOOh 
LUF LV UF NZVC=0 000 0 0 0 


Assembly Language Instructions 


Branch Conditionally Delayed and Annut if False BcondAF 


DS ANT SS at NS aL a A RO a a NO SSeS SS SSS SS SS BS SSeS SS 


Syntax BcondAF src 


Operation lf (cond is true) 

If (src is a register) 
src > PC 

If (src is a displacement) 
src + PC of branch + 3 — PC 

Else: 

If ( cond is false) 

annul execute phase results of next three instructions and continue 


Operands _— srcconditional-branch addressing modes 


Encoding 
87 


31 24 23 16 15 0 
011 01 0 E 0 1 0 1] cond register or displacement 


Instruction Word Fields 


|B | — srcaddressing modes 
ro, _regitermede 
PC-relative mode 


Description \|f the condition is true, a branch and the three instructions following the 
branch instruction are executed. If the condition is false, it annuls the effect 
of the execute phase of the next three instructions and execution continues. 
If the src operand is in register mode, then the contents of the specified reg- 
ister are loaded into the PC. Ifthe srcoperand is in PC-relative mode, then 
the sum of the PC of the branch instruction + 3 and the srcis loaded into the 
PC. In PC-relative mode the srcfield is interpretted as a 16-bit signed interg- 
er. 


None of the three instructions following the BcondAF may be an instruction 
that modifies the program flow. Interrupts are disabled for the duration of 
the BcondAF instruction. BcondAF is particular useful for controlling the exit 
at the bottom of a loop. 
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BcondAF Branch Conditionally Delayed and Annul If False 


eseaesssaeaeenaerosenscactesecscennsnsnsasennsetasecatasesaticntesszatecataseteretasetetetotetesesocesasesesscatenatatesesecenatsCotetesatetetetesatatetetesecatetesetenatecetanatetenatatetetetatets etaratesetatete etunatatetotetssestonataietetstetatatesecetatecetototatetetetetssetetetasatetetetonssaseeneretasatetetenerasetetetonssateceteeanssessteceteness 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-58 Assembly Language Instructions 


Branch Conditionally Delayed and Annul If True _BcondAT 
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Syntax BcondAT src 


Operation _|f (cond is true) 
If (src is a register) 
src > PC 
annul execute phase results of next three instructions. 
If (src is a displacement) 
src + PC of branch +3 — PC 
annul execute phase results of next three instructions. 
_ Else, continue. 


Operands — srcconditional-branch addressing modes 
Encoding 


31 24 23 16 15 8 7 0 
011 01 0 Ei 00 1 1] cond register or displacement 


Instruction Word Fields 


|B | sre addressing modes 
ro woatermoae 
PC-relative mode 


Description \f the condition is true, it performs a branch and annuls the effect of the ex- 
ecute phase of the next three instructions. If the src operand is expressed 
in register mode, then the contents of the specified register are loaded into 
the PC. If the src operand is in PC-relative mode, then the sum of the PC 
of the branch instruction + 3 and the srcis loaded into the PC. In PC-relative 
mode, the src field is interpreted as a 16-bit signed interger. 


None of the three instructions following the BcondAT may be an instruction 
that modifies the program flow. Interrupts are disabled for the duration of 
the BcondAT instruction. 


BcondAT instruction will not annul the status signals at the external inter- 
faces. The BcondAT is particular useful for controlling the entry at the top 
of the loop. 
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BcondAT Branch Conditionally Delayed and Annul If True 
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Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V «Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-60 Assembly Language Instructions 


Branch Conditionally (Dela yed) BcondD 


Syntax 


Operation 


Operands 


Encoding 


BcondD src 


If cond is true: 
If src is in register addressing mode (any register in CPU primary 
register file) 
src > PC. 
If src is in PC-relative mode (label or address), 
displacement + PC + 3 — PC. 
Else, continue. 


src conditional-branch addressing modes (B): 
0 register 
1 PC-relative 


31 24 23 16 15 87 0 
011 01 0 aloo of 1] cond register or displacement 


Description BcondD signifies a delayed branch that allows the three instructions after 


Cycles 
Status Bits 


Mode Bit 


the delayed branch to be fetched before the PC is modified. The effect is a 
single-cycle branch, and the three instructions following BconaD will not af- 
fect the cond. None of the three instructions following BconaD may be an 
instruction that modifies program flow. | 


A branch is performed if the condition Is true. If the src sperand is expressed 
in register addressing mode, the contents of the specified register are 
loaded into the PC. If the srcoperand is expressed in PC-relative mode, the 
assembler generates a displacement: displacement = label —(PC of branch 
instruction + 3). This displacement is stored as a 16-bit signed integer in the 
16 least significant bits of the branch instruction. This displacement is added 
to the PC of the branch instruction plus 3 to generate the new PC. The 
TMS320C40 provides 20 condition codes that can be used with this instruc- 
tion (see Section 11.2 on page 11-10 for a list of condition mnemonics, en- 
coding, and flags). | 


1 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—sUnaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 
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BeondD Branch Conditionally (Delayed) - 


Example BNZD 36 (36 = 24h) 
Before Instruction: 


PC = 50h 
LUF LV UF NZV C=0 0000 0 0 


After Instruction: 


PC =77h 
LUF LV UF NZVC=0 000000 


11-62 | Assembly Language Instructions 


cae eee ee 


Syntax BR src 
Operation PC+1+sre—PC 
Operands src 24-bit signed immediate displacement 


Encoding 
31 24 23 16 15 87 


0 
011 00 00 fo src (displacement) 


Description Performs an unconditional delayed branch. The src operand is assumed to 
be a 24-bit signed integer. 


Cycles 4 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—sUnaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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BRD Branch Unconditionally (Delayed) 


aS A ae naan ne alsin Aaa anaaSnielat aa’ acatn'ain’ aetate’scotolelatacalele‘nialsveca’eta"a'slelnla\eleselaia’sielsialelesslelecalssala’atelaieloleieselelorea seca iaseaeedaseaceplesaelaelacascesdapasnetasetesetelapeletn oaiaecnteeceetetet 


Syntax BRD src 
Operation PC+3+src— PC 


Operands src 24-bit signed immediate displacement 


Encoding 
31 24 23 16 15 87 0 


011 000 of 1 __ sre (displacement) 


Description Performs an unconditional delayed branch. The src operand is assumed to 


be a 24-bit signed integer. Interrupts are disabled during the BRD 
instruction. 


The three instructions following the BRD instruction are fetched and 
executed. None of these three instructions may modify the program flow 
(e.g., affect the PC value). 


Cycles | 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-64 _ Assembly Language Instructions 


Call Subroutine CALL 


seacsebcaseinsiarenesneiptansienseecees esyaaessieisaaiaseialearaersale eerateaet aaa araraataataS Aa aa ala acalatn\aiaSaSnt Sainte 


Syntax CALL src 


Operation Next PC > *(++SP) 
PC +1+src— PC 


Operands _ src 24-bit signed immediate displacement 


Encoding 
31 24 23 16 15 87 0 


011 0001 0 src (displacement) 


Description Performs a call. The next PC value is pushed onto the system stack. The 
src operand + 1 + PC address of the CALL is loaded into the PC. The src 


operand is assumed to be a 24-bit signed immediate operand (displace- 
ment). 


Cycles 4 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V«sCUnaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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CALLcond Call Subroutine Conditionally 


Syntax 
Operation 


Operands 


Encoding 


siataaaeace asians alee aan ASA sienna ih hase ess ahaa alana eaten aie aetna aeiaecaaaeceesnasaailneiaasnetoetaaaalpeoataellaalaelaetaacataell alta abelaet aaa alae aeleateinat eet alesse en PAL PG SGN GUE USEN SNA A eee ala Aaa 


CALLcond src 


If cond is true: 
Next PC — *++SP 
If src is in register addressing mode (any register in CPU primary 
register file), 
src > PC. 
If srcis in PC-relative mode (label or address), 
displacement + PC + 1 — PC. 
Else, continue. 


src conditional-branch addressing modes (B): 
0 register | 
1 PC-relative 


31 24 23 16 15 | 87 0 
0111 0 oe 0000 cond register or displacement 


Description A call is performed if the condition is true. If the condition is true, the next 


Cycles 
Status Bits 


Mode Bit 
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PC value is pushed onto the system stack. If the src operand is expressed 
in register addressing mode, the contents of the specified register are 
loaded into the PC. If the srcoperand is expressed in PC-relative mode, the 
assembler generates a displacement: displacement = label — (PC of call in- 
struction + 1). This displacement is stored as a 16-bit signed integer in the 
16 least significant bits of the call instruction word. This displacement is 
added to the PC of the call instruction plus 1 to generate the new PC. 


The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 11.2 on page 11-10 for a list of condition mnemonics, 
encoding, and flags). 


5S) 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__~sUnaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


Example 


CALLNZ R5 
PC = 123h 
SP = 80 9835h 


R5 = 789h 
LUF LV UF NZVC=#0 0000 0 0 


After Instruction: 
PC = 789h 

SP = 80 9836h 

R5 = 789h 


Data at 80 9836h = 124h 
LUF LV UF NZVC2#0 0 0000 0 
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Syntax CMPF src, dst 


Operation dst—src 


Operands src general addressing modes (G): 
00 register (RO — R11) 
01 direct. 
10 indirect 
11 immediate 


dst register (RO— R11) | 


Encoding | 
31 24 23 16 15 87 


Description The srcoperand is subtracted from the dstoperand. The result is not loaded 
into any register, thus allowing for nondestructive compares. The dst and 
src operands are assumed to be floating-point numbers. 


Cycles 1 


Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 
N 1 if a negative result is generated, 0 otherwise. 
Z 1 ifazero result is generated, 0 otherwise. 
V1 if a floating-point overflow occurs, 0 otherwise. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example CMPF *+AR4,R6 


Before Instruction: 


AR4 = 80 98F2h 

R6 = 070C80 0000h = 1.4050e+02 

Data at 80 98F3h = 070C 8000h = 1.4050e + 02 
LUF LV UF NZ V C=0 000000 


After Instruction: 


AR4 = 80 98F2h 

R6 = 070C80 0000h = 1.4050e + 02 

Data at 80 98F3h = 070C 8000h = 1.4050e + 02 
LUF LV UF NZVC#0 0004100 


11-68 | Assembly Language Instructions 


we 


Syntax CMPF3 src2, src7 


Operation = src1 —src2 


Operands src1 —src2 both type 1 or type 2 three-operand addressing modes 
Encoding 


Type 1 
31 24 23 | 16 15 87 0 
Type 2 
31 24 23 16 15 87 0 


src! addressing modes src2 addressing modes 
, indirect mode *+ARn(5-bit unsigned 
register mode (any CPU register) displacement) 
44 indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) displacement) 
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CMPF3 Compare Floa 


ting-Point Values, 


ese eee eee tsneeieaeee 


Description The src2 operand is subtracted from the src7 operand. The result is not 


Cycles 
Status Bits 


Mode Bit 
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loaded into any register. This allows for nondestructive compares. The src7 
and src2 operands are assumed to be floating-point numbers. 


1 


LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 ifanegative result is generated, 0 otherwise. 

Z 1 if azero result is generated, 0 otherwise. 

V1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


| Compare Integer CMPI 


Syntax 
Operation 


Operands 


apne 


CMPI src, dst 
dst — src 


src general addressing modes (G): 
00 _ register (any register in CPU primary register file) 
01 direct 
10 _ indirect 
11 immediate 


dst register (any register in CPU primary register file) 


24 23 15 een = 0 


Description The srcoperand is subtracted from the dstoperand. The result is not loaded 


Cycles 
Status Bits 


Mode Bit 


Example 


into any register, thus allowing for nondestructive compares. The dst and 
src operands are assumed to be signed integers. 


1 


LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if anegative result is generated, 0 otherwise. 

Z 1 if azero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C 1 if a borrow occurs, 0 otherwise. 


OVM Operation is not affected by OVM bit value. 
CMPI R3,R7 
Before Instruction: 


R3 = 898h = 2200 
R7 = 3E8h = 1000 
LUF LV UF NZV C=#=0 000000 


After Instruction: 


R3 = 898h = 2200 
R7 = 3E8h = 1000 
LUF LV UF NZVC=0 001000 
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Syntax CMPI3 src2, src 
Operation srci—src2 


Operands srci1 —src2 both type 1 or type 2 three-operand addressing modes 
Encoding | 


Type 1 
31 24 23 , 16 15 87 | 0 
Type 2 
31. | 2423 ; 1615 — 87 0 


Instruction Word Fields 


Type 1 


src! addressing modes src2 addressing modes 
| 00 | register mode (any CPU register) 8-bit signed immediate 


: indirect mode *+ARn(5-bit unsigned 
01 | register mode (any CPU register) displacement) 
. | indirect mode *+ARn(5-bit unsigned so 
displacement) ” ( 'g 8-bit signed immediate 
44 indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) _ 7 displacement) 


Type 2 


11-72 | | Assembly Language Instructions 


Compare Integer, 3 Operands CM PIS 


Description The src2 operand is subtracted from the src? operand. The result is not 


Cycles 
Status Bits 


Mode Bit 


loaded into any register. This allows for nondestructive compares. The src? 
and src2 operands are assumed to be signed integers. 


1 


LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

1 if a negative result is generated, 0 otherwise. 

1 if a zero result is generated, 0 otherwise. 

1 if an integer overflow occurs, 0 otherwise. 

1 if a borrow occurs, 0 otherwise. 


QO<NZ 


OVM Operation is not affected by OVM bit value. 
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DBcond Decrement and Branch Conditionally (Standard) ; 


Syntax 


Operation 


Operands 


age 


DBcond ARn, src 


ARn —1— ARn 
If cond is true and ARn20: 
If src is in register addressing mode (any register in CPU primary 
register file), 
src — PC. 
If srcis in PC-relative mode (label or address), 
displacement + PC + 1 — PC. 
Else, continue. 


sre conditional-branch addressing modes (B): 
0 register 
1 PC-relative 


ARn register (any register in CPU primary register file) 


87 0 


6 15 
revere cad: cond register or displacement 


Description DBcond signifies a standard branch that executes in four cycles because 
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the pipeline must be flushed if condis true. The specified auxiliary register 
is decremented and a branch is performed if the condition is true and the 
specified auxiliary register is greater than or equal to zero. 


The auxiliary register is treated as a 32-bit signed integer. The most signifi- 
cant eight bits are unmodified by the decrement operation. The comparison 
of the auxiliary register uses only the 32 least significant bits of the auxiliary 
register. Note that the branch condition does not depend on the auxiliary 
register decrement. 


If the src operand is expressed in register addressing mode, the contents 
of the specified register are loaded into the PC. If the src operand is ex- 
pressed in PC-relative addressing mode, the assembler generates a dis- 
placement: displacement = label — (PC of branch instruction + 1). This inte- 
ger is stored as a 16-bit signed integer in the 16 least significant bits of the 
branch instruction word. This displacement is added to the PC of the branch 
instruction plus 1 to generate the new PC. | 


The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 11.2 on page 11-10 for a list of condition mnemonics, 
encoding, and flags). 


Assembly Language Instructions 


Cycles 4 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—sUnaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example DBLT AR3,R2 


Before Instruction: 


PC = 5Fh 
AR3 = 12h 
R2 = 9Fh 
LUF LV UF NZV C=#=0 0010 0 0 


After Instruction: 


PC = 9Fh 
AR3 = 11h 
R2 = 9Fh 
LUF LV UF NZVC=0 001000 
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DBcondD Decrement and Branch Conditionally (Delayed) 


Syntax 


Operation 


Operands 


ae 
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DBcondD ARn, src 


ARn -—1— ARn 
If cond is true and ARN 2 0: 
If src is in register addressing mode (any register in CPU primary 
register file), 
src > PC 
If src is in PC-relative mode (label or address) 
displacement + PC + 3 — PC. 
Else, continue. 


src conditional-branch addressing modes (B): 
0 register 
1 PC-relative 


ARn register (any register in CPU primary register file) 


16 15 87 0 
eee ae cond register or displacement 


Description DBcondD signifies a delayed branch that allows the three instructions after 
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the delayed branch to be fetched before the PC is modified. The effect is a 
single-cycle branch. The specified auxiliary register is decremented and a 
branch is performed if the condition is true and the specified auxiliary regis- 
ter is greater than or equal to zero. (The three instructions following the 
DBconaD must not affect the cond). 


The auxiliary register is treated as a 32-bit signed integer. The most signifi- 
cant eight bits are unmodified by the decrement operation. The comparison 
of the auxiliary register uses only the 32 least significant bits of the auxiliary 
register. Note that the branch condition does not depend on the auxiliary 
register decrement. 


If the src operand is expressed in register addressing mode, the contents 
of the specified register are loaded into the PC. If the src is expressed in 
PC-relative addressing, the assembler generates a displacement: displace- 
ment = label — (PC of branch instruction + 3). This displacement is added 
to the PC of the branch instruction plus 3 to generate the new PC. Note that 
bit 21 = 1 for a delayed branch. 


The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 11.2 on page 11-10 for a list of condition mnemonics, 


encoding, and flags). 


Assembly Language Instructions 


| Decrement and Branch Conditionally (Dela yed) DBco ndD 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—sCUnaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example DBZD ARS, $+110h 
Before Instruction: 


PC = 0h 
AR5 = 67h 
LUF LV UF NZV C=#0 0000 0 0 


After Instruction: 


PC = 110h 
AR5 = 66h 
LUF LV UE NZVC=0 000000 
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FIX Floating-Point to Integer Conversio 


eA Ae a aaa aa aan aa aaa eae sosaeialasetssieleessetatsereateseiees wcseceeineieatesaeceeteetoateetaataey peste arene pecan ., 


Syntax FIX src, dst 
Operation _fix(src) — dst 


Operands src general addressing modes (G): 
00 register (RO — R11) 
01 direct 
10 indirect 
11 immediate 


dst register (any register in CPU primary register file) 


Encoding 
31 24 23 16 15 87 | 0 


Description The floating-point operand src is converted to the nearest integer less than 
or equal to it in value, and the result is loaded into the dst register. The src 
operand is assumed to be a floating-point number and the dst operand a 
signed integer. 


The exponent field of the result register (if it has one) is not modified. 


Integer overflow occurs when the floating-point number is too large to be 
represented as a 32-bit twos-complement integer. In the case of integer 
overflow, the result will be saturated in the direction of overflow. 


Cycles 1 


Status Bits |f ST (SET COND) =0 and, the condition flags are modified the destination 
register is RO — R11, the condition flags are modified. If ST (SET COND) 
= 1, they are modified for all destination registers. 

LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 ifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-78 Assembly Language Instructions 


Example 


Floating-Point to Integer Conversion FIX 


FIX R1,R2 

Bef str 

R1 = 0A2820 0000h = 1.3454e + 3 

R2=0Oh 

LUF LV UF NZ VCe#0 000000 
After Instruction: 


R1 = 0A2820 0000h = 13454e + 3 
R2 = 541h = 1345 
LUF LV UF NZ VC=#=0 0 00 0 0 0 
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FIX||STI_ Parallel FIX and STI 


SRN 


| Syntax 


Operation 


Operands 


Encoding 
31 


24 23 16 


LEAL LEAL E EP EL ELLE ELLE PEL CLE NLEL LLL EEL PEL EL ELE LLL LL LL ELE LIE LL ELLEN ACPI ELL WEEE ALLELES EL SAL LLLLEELEEL ELDER LL LIEL LESLIE E LL EELEL ALLE ALLO ELLE LLL EDLC NIT EACEN 


FIX src2, dst1 
|| STI src3, dst2 


fix(src2 ) > dst 
| src3— dst2 
src2__ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 


src3 register (RO —R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


15 87 0 


we | ee 


Description A floating-point-to-integer conversion is performed. All registers are read at 


Cycles 
Status Bits 


Mode Bit 
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the beginning and loaded at the end of the execute cycle. This means that 
if one of the parallel operations (STI) reads from a register, and the operation 
being performed in parallel (FIX) writes to the same register, then STI ac- 
cepts as input the contents of the register before it is modified by FIX. 


If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 


Integer overflow occurs when the floating-point number is too large to be 
represented as a 32-bit twos-complement integer. In the case of integer 
overflow, the result will be saturated in the direction of overflow. 


1 


LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if anegative result is generated, 0 otherwise. 

Z 1 ifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions — 


LERREES ALES LSRO SRS ERELEBNNRO NERS eee aoe OOM NNaNLHSMtce ONT ANU NSGSOSALSCDSN NSO SASOneNUnOSOSMAMNONONneRnoNeNuGv dintv tn esesetsiete stusssesdsasotatotenataterstarataturssetateretaretisanetates oy 


Example 


FIX *++AR4(1),R1 
ll STI RO, *AR2 


Before Instruction: 


AR4 = 80 98A2h 

Ri=0Oh 

RO = ODCh = 220 

AR2 = 80 983Ch 

Data at 80 98A3h = 733 CO00h = 1.7950e + 02 
Data at 80 983Ch = Oh 

LUF LV UF N ZV C=#0 0000 00 


After Instruction: 


' AR4 = 80 98A3h 


R1 = 0B3h = 179 

RO = ODCh = 220 

AR2 = 80 983Ch 

Data at 80 98A3h = 733 CO00h = 1.79750e + 02 
Data at 80 983Ch = ODCh = 220 

LUF LV UF NZVCz=0 00000 0 
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FLOAT Integer to Floating-Point Conversion — 


Syntax 


Operation 


Operands 


Encoding 


31 24 23 16 15 


ONSEN NPR eterna etB ARRON SPOONS SMe OOS 


FLOAT src, dst 
float (src) > dst 


src general addressing modes (G): 
00 _ register (any register in CPU primary register file) 
01 direct 
10 — indirect 
11 immediate 


dst register (RO — R11) 


87 


power The integer operand src is converted to the floating-point value equal to it, 


Cycles 
Status Bits 


Mode Bit 


Example 
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and the result loaded into the dst register. The src operand is assumed to 
be a signed integer, and the dst operand a floating-point number. 


| 


LUF Unaffected. 

LV Unaffected. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z _1ifazero result is generated, 0 otherwise. 
VO. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 


FLOAT *++AR2(2),R5 


Before Instruction: 


AR2 = 80 9800h 

R5 = 034C 2000h = 1.27578125e + 01 

Data at 80 9802h = OAEh = 174 

LUF LV UF NZ VC=0 000 0 0 0 


After Instruction: 


AR2 = 80 9802h 

RS = 072E0 0000h = 1.74e + 02 

Data at 80 9802h = OAEh = 174 

LUF LV UF NZVC=0 00000 0 


Assembly Language Instructions 


___Parallel FLOAT and STF _ F LOAT| [STF 


Syntax FLOAT src2, dst1 
| STF src3, dst2 

Operation float(src2 ) — dst? 
| src3— dst2 


Operands = src2__ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src3 register (RO — R7) 
dst2 register (disp = 0, 1, IRO, IR1) 


Encoding 

31 24 23 16 15 | 87 

Pere] on foe] vo | ae [oe 

Description An integer-to-floating-point conversion is performed. All registers are read 
at the beginning and loaded at the end of the execute cycle. This means that 
if one of the parallel operations (STF) reads from a register and the opera- 
tion being performed in parallel (FLOAT) writes to the same register, then 


STF accepts as input the contents of the register before it is modified by 
FLOAT. 


If src2 and dst2 point to the same location, src2 is read before the write to . 
dsi2. 


Cycles | 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF 0. 
N 1 if a negative result is generated, 0 otherwise. 
Z  1ifazero result is generated, 0 otherwise. 
VO. 
C Unaffected. 


Mode Bit OVM Operation is affected by OVM bit value. 
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FLOAT||STF Parallel FLOAT and STF — 


Example 
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: es SEN ARISR EEA NP SNS NAS NESSES SSNS SAAR ERE RSNA ENR NR EPRI ER SPN TAPED OD EUSP AEERENIREMEESEINIS 


FLOAT *+AR2(IRO),R6 


ll STF R7,*AR1 


Before Instruction: 


AR2 = 80 98C5h 

IRO = 8h 

R6 = 0h 

R7 = 034020 0000h = 1.27578125e + 01 

AR1 = 80 9933h 

Data at 80 98CDh = OAEh = 174 

Data at 80 9933h = Oh 

LUF LV UF NZV C=0 00000 0 


r Instruction: 


AR2 = 80 98C5h 
IRO = 8h 


~R6 = 072E00 0000h = 1.740e + 02 


R7 = 034C20 0000h = 1.27578125e + 01 

AR1 = 80 9933h 

Data at 80 98CDh = OAEh = 174 

Data at 80 9933h = 034C 2000h = 1.27578125e + 01 
LUF LV UF NZVC=0 000000 


Assembly Language Instructions 


__ Convert From IEEE Format FRIEEE 


Syntax FRIEEE src, dst 
Operation convert src from IEEE format > dst 


Operands _ src direct or indirect addressing modes 
dst extended-precision register (RO — R11) 


asides 
24 23 16 15 


Instruction Word Fields 


|G sre addressing modes 


Description The src operand is converted from the IEEE floating- point format to the 
twos-complement floating-point format. 


The src operand comes from memory. The converted result goes into an 
extended precision register as a single-precision floating-point number. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Set if overflow, otherwise unchanged. 
UF 0. 
N = Sign of the result. 
Z 1 if result is 0, 0 otherwise. 
V1 if overflow, 0 otherwise. 
C Unaffected 


Mode Bit OVM Operation is not affected by OVM bit value. 
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FRIEEE| [STF Parallel FRIEEE and STF 


Syntax 


Operation 


Operands 


Encoding 
31 


ieaeantet ae aR tate SCSEANGN GEARS SAGP LOPE PPLE LEO LET CLES EL OLE OLE SLLET LO E ELEC DEE LEA ESSA ASSL ESSE SMLS ESSEC EEE RELEASE EES Sea 


FRIEEE src2, dst? 
|| STF src3, dst2 


convert src2 from IEEE format - dst? 
in parallel with 
src3 => dst2 


src2 indirect mode (disp = 0, 1, IRO, IR1) 
dst1 register mode (RO — R7) 
src3 register mode (RO — R7) 
dst2 indirect mode (disp = 0, 1, IRO, IR1) 


24 23 16 15 87 0 


Description The src2 operand is converted from the IEEE floating-point format to the 


Cycles 
Status Bits 


Mode Bit 
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twos-complement format. The converted result goes into an extended-pre- 
cision register dst? as a single-precision floating-point number. 


A floating-point store is done in parallel. 


If src2 and dst2 point to the same location, then src2is read before the write 
to dst2. 


1 


LUF Unaffected. 
LV Set if overflow, otherwise unchanged. 


UF iO. 


N Sign of the result. 

Z 1 if result is 0, 0 otherwise. 
V1 if overflow, 0 otherwise. 
C Unaffected 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


Interrupt Acknowledge IACK 


SOGASEELIS LASS SASSANID SO SIN OOS SO ROI s RSE PEG ITN NNe ete netatetetat eM atatenenndetecanesettanesesntnmrannate te sesusnenoseetasemenanaeemenennoenneselasonanesssnonseatstesosennsnedtolateseresenssetonsrasetetonnsesstenatersterseetasetvsetesetoneve toca 


Syntax IACK src 


Operation Perform adummy read operation with IACK = 0. 
At end of dummy read, set IACK to 1. 


Operands src general addressing modes (G): 
O01 direct 
10 _ indirect 


Encoding 
31 15 | 87 0 


2423 16 
joo0{11 0110] aloo oo Src 


Description A dummy read operation is performed with [ACK = 0. At the end of the 
dummy read, IACK is set to 1 if off-chip memory is specified. This instruction 
can be used to generate an external interrupt acknowledge. If the address 
specified is off-chip, a read operation from that address is performed. The 
IACK signal and the address 
can then be used to signal interrupt acknowledge to external devices. The 
data read by the processor is unused. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Zz Unaffected. 
V Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example TACK *AR5 


Before Instruction: 


IACK = 1 
PC = 300h 
LUF LV UF NZV C=0 000000 


After Instruction: 


IACK = 1 
PC = 30th 
LUF LV UF NZV C=0 000000 
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Syntax 


Operation 


Operands 


Encoding 
31 


IDLE dle Until interrupt 


7 PPOPLLPEEEEED ECCI OOT ELLE L ELLER LLL EL ELL ES ELLIS ESAELLLELLLEEEELEEELEEL LEE EEEEEELE LEE EEE ELLE ALLEL EDO ERALL EEL 


IDLE 


1 — ST(GIE) 
Next PC — PC 
Idle until interrupt. 


None 


24 23 16 15 87 


0 
f000/001100/00000000000000000000000 


Description The global interrupt enable bit is set, the next PC value is loaded into the 


Cycles 
Status Bits 


Mode Bit 


11-88 


PC, and the CPU idles until an interrupt is received. When the interrupt is 
received, the contents of the PC are pushed onto the active system stack. 


1 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—sUnaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


Link andJump LAJ 


WARNE Ee eS 


Syntax LAJ src 


Operation PC of LAJ + 4 — extended-precision register R11 
src +3 + PC of LAJ > PC 


Operands scr 24-bit signed immediate displacement 


Encoding 
31 24 23 16 15 87 


0 
0i1000i1 1 src (displacement) » 


Description LAJ performs asingle cycle subroutine call. The three instructions following 
the LAJ instruction are performed. The return address (address of the LAJ 
instruction + 4) is placed in extended-precision register R11. The address 
branched to is formed by adding the src operand to the PC of the LAd in- 
struction + 3. 


None of the three instructions following the LAJ instruction should modify 
the program flow. Interrupts are disabled for the duration of the LAdJ in- 
struction. | 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N. Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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LAJcond Link « and J Jump Conditionally 


sutatatetetatenetanenatenetenetatetatetetaretatetatsMitarstatetatatitetaratstatetatatetatetans Matetatetenenatetaatetanetatetetatetatetaterstentaretatetererstat ane ane tatvoatanesanenetenaeneteneronstonetatesatenatenetetssetseaate arstuneta ateter era ersmarenane tetera see 


- Syntax LAJ cond src 


Operation If (cond is true) 
If (src is a register) 
PC of LAJcond + 4 > extended-precision register R11 
src + PC | 
If (src is a displacement) 
PC of LAJcond + 4 — extended-precision register R11 
src + PC of the LAJ + 3 — PC 
Else, continue. 


Operands _— srcconditional-branch addressing modes 
Encoding 


31 24 23 16 15 87 0 
011 10 ole 000 i} cond register or displacement 


Instruction Word Fields 


|B | — srcaddressing modes 
ro getermose 
PC-relative mode 


Description AJcond performs a conditional single-cycle subroutine call. The three in- 
structions following the LAJcond instruction are performed. The return ad- 
dress (address of the LAd instruction + 4) is placed in extended-precision 
register R11. The address branched to is formed by either register mode 
or PC-relative mode. 


None of the three instructions following the LAJcond instruction may modify 
the program flow. Interrupts are disabled for the duration of the LAJcond 
‘instruction. 7 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—sUnaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-90 Assembly Language Instructions 


Link and Trap Conditionally LATcond 
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Syntax LATcond N 


Operation _|f (cond is true) 
ST(GIE) — ST(PGIE) 
ST(CF) — ST(PCF) 
0 — ST(GIE) 
1 — ST(CF) 
PC of LAcond + 4 > extended-precision register R11 
trap vector N > PC 


Else, continue. 


Operands _N immediate mode — trap number (0 < N < 511) 
Encoding 
31 24 23 16 15 87 0 


Description Performs adelayed conditional trap. If traps are to be nested, you may need 
to save the status register before executing LAT cond. Ifthe condition is true, 
ST bits GIE and CF are saved in PGIE and PCF in the status register. Then 
allinterrupts are disabled (0 — GIE), and the cache is frozen (1 > CF). The 
contents of the PC of the LATcond + 4 are placed in R31, and the PC is 
_ loaded with the contents of the specified trap vector (N). If the condition is 
not true, then continue normal operation. | | 


The three instructions following LATcond will be fetched and executed. 
They may not be instructions that modify the program flow or modify the sta- 
tus register. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—Unaffected. 
C Unaffected. 


Mode Bit | OVM Operation is not affected by OVM bit value. 
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Syntax LBb sre, dst 


Operation Sign-extended byte (3, 2, 1, 0) of src > dst 
b = byte to load (3, 2, 1, 0) 
_3 | 211 | 0 | =b (byte designator 3-0) 
Operands _ srcregister, direct, or indirect addressing modes 
dst register mode (any register in CPU primary register file) 


Encoding 
3 24 23 | 16 15 8 


1 7 0 


Instruction Word Fields 


LG |__srcaddressing modes 
[oo | reistermede 
indirect mode 


Ce] _sebye 
[oo [bre oS bye 


Description The specified byte of the src operand is sign-extended and right-shifted into 
the 8 LSBs of the dst register. The src byte is signed. 


Cycles | 


41-92 Assembly Language Instructions 


Status Bits 


Mode Bit 


Example 


If ST (SET COND) =0 and the destination register is RO— R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N 1 if anegative result is generated, 0 otherwise. 

Z 1 éifazero result is generated, 0 otherwise. 

VO. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
LB2 Rl, R2 ; Sign extended byte 2 of Rl —>R2 
Before Instruction: 


R1=00AB 0000h 
R2 = 0000 0000h 


After Instruction: 


Ri = 00AB 0000h 
R2 = FFFF FFABh 
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LBUb Load d Byte Unsigned 7 


Syntax 


Operation 


Operands 


Encoding 
31 


LBUb src, dst 

Byte (3, 2, 1, 0) of src > dst 
b= byte to load (3, 2, 1, 0) 
3 | 2} 1] 0 | =b (byte designator 3 — 0) 


src register, direct, or indirect addressing modes 
dst register mode (any register in CPU primary register file) 


24 23 


Instruction Word Fields 


[S| sreaddressing modes 
[00_| resister made (any CPU reason. 


eo] sebye 
[oo [erectsbre 
of emee 


src 


Description The specified byte of the src operand is right-shifted without sign-exten- 
sion, into the 8 LSBs of the dst register. The src byte is unsigned. 


Cycles 
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1 


Assembly Language Instructions 


Status Bits \f ST (SET COND) =0 and the destination register is RO—R11, the condition 


Mode Bit 


Example 


flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 


UF 0. 
N 0. 
Z  1éifazero result is generated, 0 otherwise. 
VO 


C Unaffected. 
OVM Operation is not affected by OVM bit value. 
LB2 Ri ‘Re 


‘Before Instruction: 


R1 = OOAB 0000h 
R2 = 0000 0000h 


After Instruction: 


R1= 00AB 0000h 
R2= 0000 OOABh 
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LDA Load Adaress | Register 7 


Syntax LDA 
Operation src - dst 


Operands _srcgeneral addressing modes 
dst register mode (address registers only) 


Encoding 
31 87 


Instruction Word Fields 


(e|_ sreaddressing modes 
20 | raistr modo (any GPU rion 


Description The src operand is loaded into the dst register. The dst register may be 


any of the address registers: ARO —AR7, IRO, IR1, DP, BK or SP. The load 
is done by the end of the read phase of the pipeline. As a result, LDA is one 
cycle faster than LDI for loading these registers. (All operands are treated 
as signed integers.) 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Zz Unaffected. 
V__—Unaffected. 
C Unaffected. 


Mode Bit | OVM Operation is not affected by OVM bit value. 
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Syntax 


Operation 


Operation 


Encoding 
31 


24 23 16 0 


LDE src, dst 
src(exp) — dst(exp) 


src general addressing modes (G): 
00 register (RO — R11) 
01 direct 
10 _ indirect 
11 immediate 


dst register (RO — R11) 


15 87 


Description The exponent field of the src operand is loaded into the exponent field of the 


Cycles 
Status Bits 


Mode Bit 


Example 


dstregister. No modification of the dstregister mantissa field is made unless 
the value of the exponent loaded is the reserved value of the exponent for 
zero as determined by the precision of the srcoperand. Then, the mantissa 
field of the dstregister is set to zero. The srcand dst operands are assumed 
to be floating-point numbers. Immediate values are evaluated in the short 
floating point format. 


1 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 
LDE RO,R5 
Before Instruction: 


RO = 020005 6F30h = 4.00066337e + 00 
R5 = OAO56F E332h = 1.06749648e + 03 
LUF LV UF NZV C=0 0000 00 


After Instruction: 


RO = 020005 6F30h = 4.00066337e + 00 
R5 = 02056F E332h = 4.16990814e + 00 


LUF LV UF NZV C=0 000000 
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LDEP Load Integer From Expansion Register File t to Primary Register File 


i cd SS SS a a SS SCS SSS NSS SS a a a aS i ac aad : 


Syntax LDEP src, dst 
Operation src > dst 


Operands src expansion register file register (IVTP or TVTP) 
dst register mode (any register in CPU primary register file) 


ne 
24 23 16 15 


Description This is ameans to load a CPU register with the contents of the VTP register 
(interrupt-trap table pointer) or the TVTP register. These registers are de- 
scribed in Section 3.2. 


The src operand register from the expansion-register file is loaded into the 


dst register in the primary register file. The dstregister content is assumed 
to be an integer. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V __—sUnaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-98 | Assembly Language Instructions 


| Load Floating-Point Value LDF 


sasatelacatecacaatacArecase’s pcsacaaceaiacesnssteeieteacnccecaalntacacacasaate a atpelataesatateaSaaataiaas 


Syntax LDF sre, dst 
Operation src — dst 


Operands src general addressing modes (G): 
00 register (RO — R11) 
01 direct 
10 indirect 
11 immediate 


dst register (RO — R11) 


Encoding 
7 


31 24 23 16 15 8 


Description The src operand is loaded into the dst register. The dst and src operands 
are assumed to be floating-point numbers. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF 0. 
N 1 if anegative result is loaded, 0 otherwise. 
Z ~~ 1ifazero result is loaded, 0 otherwise. 
V0. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example LDF @9800h,R2 
- Before Instruction: 


DP = 80h 

R2 = 0h 

Data at 80 9800h = 10C5 2A00h = 2.19254303e + 00 
LUF LV UF NZV C=0 000000 


After Instruction: 


DP = 80h 

R2 = 010052 A000h = 2.19254303e + 00 

Data at 80 9800h = 10C5 2A00h = 2.19254303e + 00 
LUF LV UF N ZV C=0 00000 0 
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LD Fo nd Load looms alba Value Conditionally 
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Syntax LDFcond src, dst 


Operation _|f condis true: 
sre — dst. 
Else: 
dst is unchanged. 


Operands src general addressing modes (G): 
00 _ register (RO — R11) 
01 direct 
10 indirect 
11 immediate 


dst register (RO — R11) 
Encoding 


31 24 23 16 15 87 
roy oe fel a : 


Description \f the condition is true, the src operand is loaded into the dst register. Other- 
wise, the dstregister is unchanged. The dstand srcoperands are assumed 
to be floating-point numbers. 


The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 11.2 on page 11-10 for a list of condition mnemonics, 
encoding, and flags). Note that an LDFU (load floating-point unconditional- 
ly) instruction is 

useful for loading RO — R11 without affecting condition flags. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V __—Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-100 Assembly Language Instructions 


Load Floating-Point Value Conditionally LDFcond 


Example LDFZ R3,R5 


Before Instruction: 


R3 = 2CFF2C D500h = 1.77055560e +13 
R5 = 5F0000 003Eh = 3.96140824e + 28 
LUF LV UF NZVC=0 000100 


After Instruction: 


R3 = 2CFF2C D500h = 1.77055560e +13 
R5 = 2CFF2C D500h = 1.77055560e +13 
LUF LV UF NZV Cz=0 000100 
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LDFI Load Floating-Point Value, interlocked 


Syntax 


Operation 


Operands 


Encoding 
31 


24 23 16 


CO SLO BL OR EN I SN OS NSS ES aS SN aa aca 


LDFI src, dst 


Signal interlocked operation. 
src — dst 


src general addressing modes (G): 
01 direct 
10 — indirect 


dst register (RO — R11) 


15 87 0 


Description The src operand is loaded into the dst register. An interlocked operation is 


Cycles 
Status Bits 


Mode Bit 


Example 
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signaled over LOCK or LLOCK. The src and dst operands are assumed to 
be floating-point numbers. Note that only direct and indirect modes are al- 
lowed. Refer to Section 6.5 (page 6-13) and Section 7.7 (page 7-39) for 
detailed descriptions. 


1 


LUF Unaffected. 

LV Unaffected. 

UF 0. 

N 1 if anegative result is generated, 0 otherwise. 
Z  1ifazero result is generated, 0 otherwise. 
VO. | 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
LDFI *+AR2,R7 


Before Instruction: 

AR2 = 8098F 1h 

R7 =Oh 

Data at 80 98F2h = 584 CO00h = — 6.28125e + 01 
LUF LV UF NZV C=#0 000000 


After Instruction: 


AR2 = 8098F 1h 

R7 = 0584C0 0000h = — 6.28125e + 01 

Data at 80 98F2h = 584 CO00h = — 6.28125e + 01 
LUF LV UF N ZV C=0 0 000 0 1 


Assembly Language Instructions 


Parallel LDF and LDF LDF||LDF 


SSRIS SSA OC IISA MOTION LD TTR ITT HoNBUAaeaAaPatatesetaturahelenataterereranatatetetatvterenerorenelatatetete jet stern atatetannedssistesn vie vantosntomensnsatvivistassedsstenveonatuteluescoscraretetateleteianeretoreratetepareteresere eetetatatatate anbeatbeaeanaecaaneneesesasecaaeasans:seeneeaactatesoessens esuseseatanesetatomemeSetetssetemamtNe 


Syntax LDF src2, dsi2 
|| LDF src1, dst 
Operation src2 + dst2 
[| srct — dst? 


Operands src1_ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src2__ indirect (disp = 0, 1, IRO, IR1) 
dst2 register (RO — R7) 


Encoding 
87 | 0 


31 24 23 16 15 


Description Two floating-point loads are performed in parallel. If the LDFs load the same 
register, the assembler issues a warning. The result is that of LDF src2, dst2. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V __~sUnaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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LDF||LDF Parallel LDF and LDF 


Example LBF - AR1(IRO),R7 
ll LDF *AR7++(1),R3 


Before Instruction: 


AR1 = 80 985Fh 

IRO = 8h 

R7=0h 

ART = 80 988Ah 

R3 =0h 

Data at 80 9857h = 70C 8000h = 1.4050e + 02 
Data at 80 988Ah = 57B 4000h = 6.2812506e + 01 
LUF LV UFNZVC=0000000 


After Instruction: 


AR1 = 80 9857h 

R7 = 070C80 0000h = 1.4050e + 02 

AR7 = 80 988Bh 

R3 = 057B40 0000h = 6.281250e + 01 

Data at 80 9857h = 70C 8000h = 1.4050e + 02 
Data at 80 988Ah = 57B 4000h = 6.281250e + 01 
LUF LV UF NZV Ce#0 000000 


| 11-104 | Assembly Language Instructions 


Syntax 


Operation 


Operands 


Encoding 
31 


LDF src2, dst? 
|| STF src3, dst2 


src2 — dst1 
\| src3 — dst2 


src2 indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src3 register (RO — R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


87 


| 24 23 16 15 0 


Description A floating-point load and a floating-point store are performed in parallel. 


Cycles 
Status Bits 


Mode Bit 


lf src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 


| 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—sUnaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 


11-105 


LDF||STF Parallel LDF and STF 


Fea saeaeseaeataesacdeteneaeeenesesetetesatateenesarattesatatatet es aaattetetatateteeeataeacaesataconasacacatoteeonanellesesaseoneesasatetessatoteesssatoetetatatesnanatoesenotetsatateetatensteasssocsesatoteretatonenstcesssatotsanesonenseesnenlogeraasnesenesssnnonesnsnee 


‘Example LDF*AR2- —-(1),R1 
ll STF R3,*AR4++ (IRL) 


| Before Instruction: 


AR2 = 80 98E7h 
R1=0h 
R3 = 057B40 0000h = 6.28125e + 01 
AR4 = 80 9900h 
IR1 = 10h 
Data at 80 98E7h = 70C 8000h = 1.4050e + 02 
Data at 80 9900h = Oh | 
 LUF LV UF NZVC=#0 000000 


After Instruction: 


AR2 = 80 98E6h 

Ri =070C80 0000h = 1.4050e + 02 

R3 = 057B40 0000h = 6.28125e + 01 

AR4 = 80 9910h 

IR1 = 10h 

Data at 80 98E7h = 70C 8000h = 1.4050e + 02 
Data at 80 9900h = 57B 4000h = 6.28125e + 01 
LUF LV UF NZVCH#0 000000 


11-106 Assembly Language Instructions 


Load 16 MSBs With 16-Bit Immediate LDHI 


nanan om nn MRR nd Ad a Md ed 


Syntax 
Operation 


Operands 


Encoding 
31 


000 1 


Operation 


Cycles 
Status Bits 


Mode Bit 


Example 


atatatate'e’ate’atatatetotatstacaca‘ecataataca’aa‘etatataatatetataataatsacteteateatatatetatateatetaats'staee'aaiaataatantateateatetaceaetaan'e'sts'ataeatepatn ata atacaateaatataacetaabatateatabasdtaaablal aa shaia shal aati 
neds puis Pela AERO EH RA A A PCI RC CR 


LDHI src, dst 
src — 16 MSBs of dst 


src 16-bit unsigned immediate 
dst register mode 


24 23 16 15 87 0 


111 aap ast sre (immediate value) 


The 16-bit unsigned src immediate value is loaded into the 16 MSBs of the 
dstregister. 0 is loaded into the 16 LSBs of the dst register. The dst register 
is assumed to be an integer. 


| 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—sUnaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 
LDHI 44h, R22 | 

Before Instruction: 

R2 = ABCD EFi2h 

After Instruction: 

R2 = 0044 0000h 
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LDI Load Integer 


o 


Sasa NTN Neer ee eteletelesatonaneseavanssrananatatatatassneteseheseeosenncotonenstotetelotatetetvieieaseescsasasocenstenneotonstonaeaneconecesaconeconerotenstonenenatotersnaranasssatatanararatarateraratenanaratssaneraceretesereronstonstetonstemsee a statana"s stssanstssoSssoaiereisecngonseetonessnOCenatnoe 


Syntax LDI src, dst 
Operation src — dst 


Operands src general addressing modes (G): 
00 _ register (any register in CPU primary register file) 
01 direct 
10 indirect | 
11 immediate 


dst register (any register in CPU primary register file) 
Encoding 


31 24 23 16 15 87 

Description The src operand is loaded into the dst register. The dst and src operands 
are assumed to be signed integers. An alternate form of LDI, LDP, is used 
to load the data page pointer register (DP) or any other register with the eight 


MSBs of a relocatable address. See the LDP instruction in this chapter and 
subsection 11.3.2 (on page 11-15). 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF 0. 
N 1 if a negative result is generated, 0 otherwise. 
Z  1éifazero result is generated, 0 otherwise. 
VO. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-108 | es. Assembly Language Instructions 


Example 


Load Integer DI 


MILL ALLS SAPP LLLA SD LLDSLSSLAA MIL LSTSTDESSTVDTTODLDSTULLD LALLY LSSSDPDUID TTP DTDUTTSDTSEDODPLEAIPTOELOLILOSEDLEDEESAEEELEDSELLSSRSIVSOSTOTOSEL Be 


LDI *-AR1(IRO),R5 
re Instructi 


AR1 = 2Ch 

IRO = 5h 

R5 = 3C5h = 965 

Data at 27h = 26h = 38 

LUF LV UF NZV Cz=0 0 000 0 0 


After Instruction: 


AR1 = 2Ch 

IRO = 5h 

R5 = 26h = 38 

Data at 27h = 26h = 38 

LUF LV UF N ZV C#0 000000 


11-109 


Syntax 


Operation 


Operands 


Encoding 
31. 


| 2423 _ 16 


LDicond Load integer Conditionally == 


LDicond src, dst 


If cond is true: 
src — dst, 
Else: 
dst is unchanged. 


src general addressing modes (G): 


00 register (any register in CPU primary register file) 
01 direct — 

10 indirect 

11 immediate 


dst register (any register in CPU primary register file) 


15 _ 87 7 | 0 


Description |f the condition is true, the src operand is loaded into the dst register. Other- 


Cycles 
Status Bits 


Mode Bit 
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wise, the dstregister is unchanged. The dstand srcoperands are assumed 
to be signed integers. 


The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 11.2 on page 11-10 for a list of condition mnemonics, 


_ encoding, and flags). Note that an LDIU (load integer unconditionally) in- 


struction is useful for loading a selected CPU register without affecting the 
condition flags. 


1 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


Load Integer Conditionally LDicond 


a'atwateceeceteta’ataace nee en epee AACA lecasacaelaiaia aalalelaiaaa’ aacael Secale ela iataiataaelatata stacaiat alata setae Sat os 


_ Example LDIZ R4,R6 
Before Instruction: 
R4 = 027Ch = 636 


R6 = OFE2h = 4,066 
LUF LV UF NZVC=#0 0 00 0 0 0 


After Instruction: 
R4 = 027Ch = 636 


R6 = OFE2h = 4,066 
LUF LV UF NZV C=0 0000 0 0 


11-111 


- esiseiesiestseiesh sepeeaineleeeieaetataeecntnsneencnaeeneae ‘ Meet edeadciteee ids 


Syntax LDIl src, dst 


Operation Signal interlocked operation. 
src — dst 


Operands src general addressing modes (G): 
O01 direct 
10 _ indirect 
dst register (any register in CPU primary register file) 


_ Encoding 


31 24 23 16 15 87 0 
Bf Oey Ir) Bll A 


Description The src operand is loaded into the dst register. An interlocked operation is 
signaled over LOCK or LLOCK . The src and dst operands are assumed 
to be signed integers. Note that only the direct and indirect modes are al- 
lowed. Refer to Section 7.7 on page 7-39 for a detailed description. 


Cycles 1 


Status Bits \f ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. | 
LUF Unaffected. 

LV Unaffected. 

UF 0. : 

N 1 if a negative result is generated, 0 otherwise. 

Z _1éifazero result is generated, 0 otherwise. 
VO. 

C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example LDII @985Fh,R3 


Before Instruction: 

DP = 80 

R3 = Oh 

Data at 80 985Fh = ODCh | 

LUF LV UF NZVC#0 0000 0 0 


After Instruction: 

DP = 80 

R3 = ODCH 

Data at 80 985Fh = ODCh 

LUF LV UF NZV C=0 000000 


11-112 Assembly Language Instructions 


Parallel LDi andLDI_ LDI||LDI 


PAE EIEN GAEL Pig ET Ee I IE EE ii Ne ile in SLE ehh Pati Pt an Matt ate BS icf i RE ad Ea BS ial Li iT eA I a 


Syntax LDI src2, dst2 
|| LDI src1, dst? 
Operation src2 — dst2 


[| srcl — dst1 


Operands — src1_ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src2__ indirect (disp = 0, 1, IRO, IR1) 
dst2 register (RO — R7) 
Encoding 
15 87 0 


31 24 23 16 


Description Two integer loads are performed in parallel. A warning is issued by the as- 
sembler ifthe LDIs load the same register. The resultis that of LDI src2, dst2. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V _~Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


14-113 


LDI||LDI Parallel LDI andLbi 


Example 
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LDI *-AR1(1),R7 
ll LDI *AR7++(IRO),R1 


nstruction: | 


AR1 = 80 9826h 

R7 = Oh 

AR7 = 80 98C8h 

IRO = 10h 

R1=0Oh 

Data at 80 9825h = OFAh = 250 
Data at 80 98C8h = 2EEh = 750 


uss wessniseronasansnbasebanseacnss ee eaiaseiaseueatatueceeteeatetesesuSetatatate'G"aMatetatatete a state statatetaistatatata i teMssMytatatatsistotatatatatatstatstutstatatySuMitotthatstctsSe ieamnsnsnSnSomemnSnte tego goO 


LUF LV UF NZV Cz=0 0 0000 0 


After instruction: 


AR1 = 80 9826h 

R7 = OFAh = 250 

AR7 = 80 98D8h 

IRO = 10h 

R1 = O2EEh = 750 

Data at 80 9825h = OFAh = 250 
Data at 80 98C8h = 2EEh = 750 


LUE LV UFNZVC=0 000000 


Assembly Language Instructions 


Syntax LDI src2, dst? 
| STI src3, dst2 


Operation src2 — dst1 
l| src3 — dst2 


Operands = src2__ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src3 register (RO — R7) 
dsi2 indirect (disp = 0, 1, IRO, IR1) 


Encoding 
31 15 | 87 0 


24 23 16 
Pore pedal a | ae 


Description An integer load and an integer store are performed in parallel. If src2 and 
dsi2 point to the same location, src2 is read before the write to dsi2. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__~—Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-115, 


Example 


11-116 


SNS a BSS Soc DOSS BSS oS NSS SAO SOTV Caaas cned 


LDI *-AR1 (1) ,R2 
ll STI R7,*AR5++ (IRO) 


Before Instruction: 


AR1 = 80 98E7h 

R2 = 0h 

R7 = 35h = 53 

ARS = 80 982Ch 

IRO = 8h 

Data at 80 98E6h = ODCh = 220 

Data at 80 982Ch=0h | 

LUF LV UF NZ V Cz=#0 00000 0 


AR1 = 80 98E7h 
R2 = ODCh = 220 
R7 = 35h = 53 
AR5 = 80 9834h 
IRO = 8h 


- Data at 80 98E6h = ODCh = 220 


Data at 80 982Ch = 35h = 53 


LUF LV UF NZVC=0 0000 0 0 


Assembly Language Instructions 


Load Fioating-Point Mantises_ CDM 


Syntax 
Operation 


Operands 


Encoding 
31 


Description 


Cycles 
Status Bits 


Mode Bit — 


Example 


24 23 | 16 0 


LDM src, dst 
src (man) — dst (man) 


src general addressing modes (G): 
00 register (RO — R11) 
01 direct 
10 indirect 
11 immediate 


dst register (RO — R11) 


15 87 


The mantissa field of the src operand is loaded into the mantissa field of the 
dst register. The dst exponent field is not modified. The src and dst 
operands are assumed to be floating-point numbers. If immediate 
addressing mode is used, bits 15 —12 of the instruction word are forced to 
0 by the assembler. If the source is in the memory, the 32-bit data are loaded 
into the mantisa field. | 


1 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 
LDM 156.75,R2 (156.75 = 07 1CCO0 0000h) 


Before Instruction: 


R2 = 0h 
LUF LV UF NZVC=#0 0 00 0 0 0 


After Instruction: 


R2 = 00 1CCO0 0000h = 1.22460938e + 00 
LUF LV UF NZVC=#=0 0000 00 
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LDP Load Data Page Pointer 
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Syntax LDP src/, DP] 
Operation src — Data page pointer 
Operands _ srcis the 16 MSBs of the absolute 32-bit source address (src). 
dst Is optional (data page pointer understood if “,DP” left out of operand) 


Encoding 
87 0 


31 _24 23 16 15 
jo00f01 0000/1 1110000 src 


Description This pseudo-op is an alternate form of the LDI instruction, except that LDP 


is always in the immediate addressing mode (bits 22 — 21 = 115). The16 
MSBs of the src absolute 32-bit value (note that an srcless than 32 bits will 
be zero filled to make the 32 bits) are loaded into the 16 LSBs of the data 
page pointer. (For example, an srcof any 16-bit value will result in 16 zeroes 
placed in the DP (the 16 extended zeroes used to fill the MSBs of the src 
value). 


The 16 LSBs of the pointer are used in direct addressing as a pointer to the 
page of data being addressed. There is a total of 256 pages, each page 64K 
words long. Bits 31 — 16 of the pointer are reserved and should be kept to 
zero. 


Cycles | 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—sUnaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


Example LDP @809900h, DP 
or 
LDP @809900h 


Before Instruction: 


DP = 6465h 
LUF LV UF NZ V C=0 000000 


After Instruction: 


DP = 0080h (16 MSbs of 32-bit src, zeroes extended) 
LUF LV UF NZVC=#0 000000 


11-118 | Assembly Language Instructions - 


Syntax 


Operation 


Operands 


Encoding 
31 


011 


Description 


Cycles 
Status Bits 


Mode Bit 


Example 


1 0 


Load Integer From Primary Register File to Expansion Register File LDPE 


soy SSS SSS SS SS aN Dal ee ar Sasa SS ST SS a SSS RS SSS SS 


LDPE sre, dst 

src— dst 

src register mode (any register in CPU primary register file) 
dst expansion register file register (IVTP or TVTP) 


24 23 16 15 8 7 


This is a means to load the IVTP register (interrupt-vector table pointer) or 
TVTP register (trap-vector table pointer). These registers are described in 
Section 3.2 on page 3-15. 


110 1 is 


The scr operand register from the primary-register file is loaded into the dst 


register in the expansion register file. The dstoperand is assumed to be an 
integer. 


1 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V «Unaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 


LDPE AR, TVTP ; set trap-vector pointer 
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LDPK Load Data-Page Pointer Immediate 


Syntax LDPK src 
Operation src—» DP 


Operands src 16-bit unsigned immediate 


Encoding 
31 24 23 16 15 87 0 


000 1411411 03414171 0000 src 


Description The 16-bit unsigned immediate value is loaded into the DP register. This 
operation is completed by the end of the decode phase of the LDPK instruc- 
tion; thus, the value loaded is ready for the next instruction for immediate 
addressing. 7 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
VV —Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-120 Assembly Language Instructions 


Syntax LHw src, dst 


Operation _Sign-extended half-word (0, 1) of src > dst 
w = half-word to load (0, 1) 


= w designator 


Operands _ src register, direct, or indirect addressing modes 
dst register mode (any register in CPU primary register file) 


Encoding 
31 24 23 16 15 87 0 
Sic 


Instruction Word Fields 


src addressing modes 
register mode (Rn, 0<n< 31) 


ot | drectmode 


O | indirect mode 


src half-word 


half-word 0 (LS half-word) 
half-word 1 (MS half-word) 


Description The specified half-word of the src operand is sign-extended and right- 
shifted into the 16 LSBs of the dst register. The src half-word is signed. 


Cycles 1 


11-121 


LHw Load Half-Word 


Status Bits \f ST (SET COND) =0 and the destination register is RO— R11, the condition 


Mode Bit 
Example 


11-122 


flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0.. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if azero result is generated, 0 otherwise. 

VO. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
LHO Rl, R2 

Before Instruction: 

R1 = ABCD EF12h 

R2 = 1234 5678h 


After Instruction: 


R1 = ABCD EF12h 
R2= FFFF FF12h 


Assembly Language Instructions 


Load Half-Word Unsigned LHUw 
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Syntax LHUw sre. dst 


Operation Unsigned half-word (0, 1) of src > dst 
w = half-word to load (0, 1) 


= w designator 


Operands _ srcregister, direct, or indirect addressing modes 
dst register mode (any register in CPU primary register file) 


Encoding 
15 87 


31 24 23 16 0 
Instruction Word Fields 

G | sre addressing modes 

| 00 register mode (any CPU register) 


| HY src half-word 


| 0 | half-word 0 (LS half-word) 


half-word 1 (MS half-word) 


Description The specified half-word of the srcoperand is unsigned and right-shifted into 
the 16 LSBs of the dst register. The src half-word is unsigned. 


Cycles | 
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Status Bits \f ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
— tion registers. 

LUF Unaffected. 

LV Unaffected. 


Mode Bit 


Example 
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UF 0. 
N 0. 
Z  1ifazero result is generated, 0 otherwise. 
V0 


C Unaffected. 

OVM Operation is not affected by OVM bit value. 
LHUO RI, R2 

Before Instruction: 


R1 = ABCD EF12h 
R2 = 1234 5678h 


After Instr n: 


~ Ri = ABCD EF12h 


R2= 0000 EF12h 


Assembly Language Instructions 


nl ogical Shift LSA 


Syntax LSH count, dst 


Operation _|f count20: 
dst << count > dst 
Else: 
dst >> |count | — dst 


Operands dst general addressing modes (G): 
| 00 register (any CPU register) 
O01 direct 
10 _ indirect 
11 immediate 


dst register (any register in CPU primary register file) 


Encoding 
31 15 87 


24 23 16 0 


Description The seven least significant bits of the count operand are used to generate 
the twos-complement shift count. If the count operand is greater than zero. 
the dst operand is left-shifted by the value of the count operand. Low-order 
bits shifted in are zero-filled, and high-order bits are shifted out through the 
C (carry) bit. | 


Logical left-shift: 
C<dst<0 


lf the countoperand is less than zero, the dstis right-shifted by the absolute 
value of the count operand. The high-order bits of the dstoperand are zero- 
filled as they are shifted to the right. Low-order bits are shifted out through 
the C (carry) bit. | 


Logical right-shift: 
0- dst C 


If the count operand is 0, no shift is performed, and the C (carry) bit is set 
to 0. The countoperand is assumed to be a signed integer, and the dstoper- 
and is assumed to be an unsigned integer. 
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Cycles 


Status Bits 


Mode Bit 


Example 


Example 
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1 


lf ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. | 

LUF Unaffected. 

LV Unaffected. 

UF 0. | 

N MSB of the output. 

Z 1 /ifazero output is generated, 0 otherwise. 

VO. | 

C Set to the value of the last bit shifted out. 0 for a shift count of 0. 


OVM Operation is not affected by OVM bit value. 
LSH R4,R7 


' Before Instruction: 


R4 = 018h = 24 
R7 =02ACh 
LUF LV UF N ZV C=0 0 00 0 0 0 


After Instruction: 


R4=018h =24 
R7 = 0ACO00 0000h 
LUF LV UF NZV C=#0 00101 0 


LSH *-ARS5(IR1),R5 
Before Instruction: 


AR5 = 80 9908h 

IRO = 4h 

R5 = 00 12C0 0000h 

Data at 80 9904h = OFFF FFFF4h = —12 

LUF LV UF N ZV C=0 0 00 0 0 0 


After Instruction: 


ARS = 80 9908h 

IRO = 4h 

R5 = 00 0001 2C00h | 

Data at 80 9904h = OFFF FFFF4h = —-12 


LUF LV UF NZV C=0 000 0 0 0 


Assembly Language Instructions | 


Syntax LSH3 count, src, dst 


Operation _|f count = 0: 


- §fC << count > dst 
Else: 


src >> |count | — dst 


Operands src, count both type 1 or type 2 three-operand addressing modes 


dst register mode (any register in CPU primary register file) 
Encoding 
Type 1 
31 24 23 : 16 15 87 0 


Type 2 
31 24 23 16 15 


8 7. 


Instruction Word Fields 


register mode (any CPU register) 


10 indirect mode *+ARn(5-bit unsigned 
displacement) | 


Type 2 


src? addressing modes 


src2 addressing modes 
00 | register mode (any CPU register) 8-bit signed immediate 


indirect mode *+ARn(5-bit unsigned 

displacement) 

8-bit signed immediate 

14 indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) displacement) | | 


[06 | eaister mode (any CPU resister) | register mode (any CPU register. 
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SS Ss SIS RS SS SCS x SS RBS Nc KS SS aS SE a 


Description The seven least significant bits of the count operand are used to generate 


Cycles 
Status Bits 


WE Mode Bit 
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the twos-complement shift count. 


If the count operand is greater than zero, the dst operand is left shifted by 
the value of the countoperand. Low-order bits shifted in are zero-filled, and 
high-order bits are shifted out through the C (carry) bit. 


Logical left-shift: 


[eo] [ae Jee 


If the count operand is less than zero, the src operand is right shifted by the 
absolute value of the countoperand. The high-order bits of the dst operand 
are zero-filled as shifted to the right. Low-order bits are shifted out through 
the C (carry) bit. 


Logical right-shift: 


0+ [re]>[ ee _] [6] 


If the count operand is 0, no shift is performed and the C (carry) bit is set 
to 0. The count operand is assumed to be a signed integer. The src and dst 
operands are assumed to be unsigned integers. 


lf count is greater than 32, the LSB ends up in the carry (C) bit. If count is 
less than —32, 0 ends up in the carry bit. This also applies to LSH. 


1 


lf ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. : 

N =  MSB of the output. 

Z 1ifazero output is generated, 0 otherwise. 

V0. | 

C Set to the value of the last bit shifted out. 0 for a shift countof 0. Unaf- 
fected if dst is not RO — R7. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 
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Syntax LSH3 count, src2, dst1 
|| STI src3, dst2 


Operation —\|f count> 0: 
src2 << count —> dst1 
Else: 
src2 >> |count| — dst? 
|| src3 > dst2 


Operands count register (RO — R7) 
src? __ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src2__register (RO — R7) 
dsit2__ indirect (disp = 0, 1, IRO, IR1) 
Encoding 
87 


31 | 24 23 16 15 0 


Description The seven least significant bits of the count operand are used to generate 
the twos-complement shift count. 


lf the count operand is greater than zero, the dst operand is left shifted by 
the value of the count operand. Low-order bits shifted in are zero-filled, and 
high-order bits are shifted out through the C (carry) bit. : 


Logical left-shift: 
C< dsi2<—0 


lf the count operand is less than zero, the dst operand is right shifted by the 
absolute value of the countoperand. The high-order bits of the dst operand 
are zero-filled as shifted to the right. Low-order bits are shifted out through 
the C (carry bit). 
Logical right-shift: 

0 dst2>C 
lf the count operand is 0, no shift is performed and the carry bit is set to 0. 


The countoperand is assumed to be a 7-bit signed integer, and the src2 and 
dst7 operands are assumed to be unsigned integers. All registers are read 
at the beginning and loaded at the end of the execute cycle. This means that 
if one of the parallel operations (STI) reads from a register and the operation 
being performed in parallel (LSH3) writes to the same register, then STI ac- ae 
cepts as input the contents of the register before it is modified by the LSH3. 


If src2 and dsi2 point to the same location, src2 is read before the write to 
dst2. 
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Cycles 


1 


Status Bits LUF Unaffected. 


Mode Bit 


Example 
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LV Unaffected 

UF 0. 

N MSB of the output. 

Z 1 ifazero output is generated, 0 otherwise. 

VV O.z 

C Set to the value of the last bit shifted out. 0 for a shift count of 0. 


OVM Operation is affected by OVM bit value. 


LSH3 R2,*++AR3(1),R0 
ll STI R4,*-ARS5 


Before Instruction: 


R2 = 18h = 24 

AR3 = 8098C2h 

RO =Oh 

R4 = ODCh = 220 

AR5 = 80 98A3h 

Data at 80 98C3h = OACh 

Data at 80 98A2h = Oh 

LUF LV UF NZV C=0 000000 


After Instruction: 


R2 = 18h = 24 

AR3 = 8098C3h 

RO = OACOO0 0000h 

R4 = 0ODCh = 220 

AR5 = 80 98A3h 

Data at 80 98C3h = OACh 

Data at 80 98A2h = ODCh = 220 | 

LUF LV UF NZ V C=#=0 001010 


Assembly Language Instructions 


Parallel LSH3 and ST! LSH3||STI 


Example 


LSH3R7,*AR2- -—(1),R2 
ll STI RO, *+ARO (1) 


Before Instruction: 


R7 = OFFFFF FF4h =-12 

AR2 = 80 9863h 

R2 = 0h 

RO = 12Ch = 300 

ARO = 80 98B7h 

Data at 80 9863h = 2C00 0000h 

Data at 80 98B8h = Oh 

LUF LV UF N ZV CH#0 0 00 0 0 0 


r Instruction: 


R7 = OFFFFF FF4h =-—12 

AR2 = 80 9862h 

R2 = 2C000h 

RO = 12Ch = 300 

ARO = 80 98B7h | 

Data at 80 9863h = 2C00 0000h 

Data at 80 98B8h = 12Ch = 300 

LUF LV UF NZVC=0 000000 
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LWLct Load Word Left-Shifted | 


a a RR SS OSS ses RS SS SEE ier aeler ee 


Syntax LWLct src, dst 
Operation src. << {0, 1, 2, or 3} bytes and merged with dst > dst 


Operands ct the count of bytes {0, 1, 2, or 3} to shift left (ct x 8 = shift in bits) 
src register, direct, or indirect addressing modes 
dst register mode (any register in CPU primary register file) 


oe 
24 23 1615 / 


RoAnee HCI eM MA 


Instruction Word Fields 


|G | sreaddressing modes 
| 00 register mode (any CPU register) 
[01 [arectmedes 


10 | indirect mode 


re] _sebye «dts 
[eo [rose 
[it [ sition ye spaces 


Description The src operand is left shifted the specified number of bytes and merged 


with the bytes of the dst register that are below the left-shifted LSB of the 
Src register. 


Cycles | 


11 


11-132 | Assembly Language Instructions 


Load Word Left-Shifted LWLct 


Status Bits 


Mode Bit 
Example 


lf ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N  MSB of the output. 

Z  1ifazero result is generated, 0 otherwise. 

VO. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
LWL2 R1, R2 

Before Instruction: 

R1 = ABCD EFi2h 

R2 = 1234 5678h 


After Instruction: 


R1 = ABCD EF12h (remains unchanged) 
EF12 0000h (left shifted interim value) 


R2 = EF12 5678h (contents merged) 
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LWRet Load Word Right-Shifted : 


Syntax — LWRet src, ast 
Operation src>>{0, 1, 2, or 3} bytes and merged with dst > dst 


Operands ct the count of bytes {0, 1, 2, or 3} to shift right (ct x 8 = shift in bits) 
sre register, direct, or indirect addressing modes 
dst register mode (any register in CPU primary register file) 


Encoding 


31 24 23 16 15 87 


Instruction Word Fields 


(S| sreaddressing modes 
00" Feiser mode (any CPU east) 


[seas 
[ooo] 
shift mig 1 byte space | 


Description The src operand is right shifted the specified number of bytes and merged 
with the bytes of the dst register that are above the right-shifted MSB of the 
src register. Sign is not extended. 


Cycles 1 


11-134 Assembly Language Instructions 


Load Word 1 Right-Shifted LWRet 


Status Bits |f ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N  MSB of the output. 

Z 1 ifazero result is generated, 0 otherwise. 
VO. 

C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example LWR1 AR1, R2 


Before Instruction: 


AR1 = ABCD EF12h 
R2 = 1234 5678h 


After Instruction: 


AR1 = ABCD EF12h (remains unchanged) 
OOAB CDEFh (right-shifted interim value) 


R2 = 12AB CDEFh (contents merged) 
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Syntax MBct src, dst 


Operation 8LSBs of src << {0, 1, 2, or 3} bytes and merged with dst — dst 


Operands ct the count of bytes {0, 1, 2, 3} to shift left (ct x 8 = shift in bits) 
sre register, direct, or indirect addressing modes 
dst register mode (any register in CPU primary register file) 


Encoding 
3 24 23 1615 87 0 
1¢ 67 | 


; oar 


Instruction Word Fields 


|G | srcaddressing modes 
| 00 | register mode (any CPU register) 


10 | indirect mode 


no shift 


shift left 1 byte space 
shift left 2 byte spaces 
shift left 3 byte spaces 


Description The 8LSBs of the srcoperand are left shifted (0, 1,2, or 3) bytes and merged 
with the dst register. 


Cycles | 
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Status Bits |f ST (GET COND) =0 and the destination register is RO—R11, the condition 


Mode Bit 


Example 


flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 


UF 0 


N  MSB ofthe output. 

Z 1 ifazero result is generated, 0 otherwise. 
VO. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
MB2 AR1, AR2 


AR1= ABCD EF12h 
AR2= 1234 5678h 


After Instruction: 


AR1 = ABCD EF12h (remains unchanged) 
OOAB CDEFh (left-shifted interim value) 


AR2 = 1212 5678h (contents merged) 
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Syntax MHct sre, dst 


Operation 16LSBs of src << {0, 1} half-words merged with dst > dst 


Operands ct the count of half-word (16-bit) shifts 

sre register, direct, or indirect addressing modes 

dst register mode (any register in CPU primary register file) 
Encoding 
31 15 87 


24 23 16 0 


Instruction Word Fields 


[S| sreaddressing modes 
[00 | register mode (any OPU risen) 


shift left 1 half-word (16 bits) 


Description The 16LSBs of the src operand are left shifted (0, 1) half-words and merged 
with the dst register. 


Cycles 1 


11-138 Assembly Language Instructions 


Merge 


Status Bits \|f ST (SET COND) =0 and the destination register is RO—R11, the condition 


Mode Bit 


Example 


flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N  MSB of the output. 

Z 1 éifazero result is generated, 0 otherwise. 

VO. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
MH1 AR1, AR2 


Before Instruction: 


AR1= ABCD EFi2h 
AR2= 1234 5678h 


After Instruction: 


AR1 = ABCD EF12h (remains unchanged) 
EF 12 0000h (left-shifted interim value) 


AR2 = EF12 5678h (contents merged) 
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| MPYF Multiply Floating-Point Values 


Syntax 


Operation 


Operands 


Encoding 


31 24 23 16 15 
eres el a : 


7 MPYF src, dst 


dst x src > dst 


sre general addressing modes (G): 
00 register (RO — R11) 

01 direct 

10 indirect 

11 immediate 


dst register (RO — R11) 


87 0 


Description The product of the dst and src operands is loaded into the dst register. The 


src operand is assumed to be a single-precision floating-point number, and 


_ the dst operand is an extended-precision floating-point number. 


Cycles 


Status Bits 


1 


LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 


_N $1 if anegative result is generated, 0 otherwise. 


Mode Bit 


Example 
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Z__1ifazero result is generated, 0 otherwise. 
V1 if a floating-point is overflow occurs, 0 otherwise. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 
MPYF RO,R2 

Before Instruction: 

RO = 07 0C80 0000h = 1.4050e + 02 


~ R2=03 4C20 0000h = 1.27578125e + 01 


LUF LV UF NZVCe#0 000000 
After Instruction: 
RO = 07 0C80 0000h = 1.4050e + 02 


- R2 = 0A 600F 2000h = 1.79247266e + 03 
-LUFLV UF NZVG=0 000000 


Assembly Language Instructions 


Syntax MPYFS3 src2, src1, dst 


Operation src1x src2 > dst 


Operands srci,src2__ both type 1 or type 2 three-operand addressing modes 


dst register mode (RO — R11) 
Encoding 
Type 1 | | 
31 24 23 16 15 87 | 0 
Type 2 
3 24 23 | 16 15 87 0 


src2 addressing modes 
register mode (RO — R11) 


indirect mode (disp = 0, 1, 1RO,IR1) 
indirect mode (disp = 0, 1, IRO, IR1) 


| src1 addressing modes src2 addressing modes 
01 | register mode (RO — R11) seh aaabeae ean (eno Unetanee 
44 indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) displacement) | 


Description The product of src1, and src2, is loaded into the dst register. The values at 
src1, src2, and dst are extended-precision floating-point numbers. 


be 
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register mode (any CPU register) 


_— 


MPYFS _Multiply Floating Pe Point Values, 3 Operands 


sorta eaten ceca REVNNSSSSASSNA TAN SSN NENA SANS ONSS SMU SNENS SSS INS URNA 


Cycles | 


Status oie LUF 1 if a floating- point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 
N 1 if a negative result is generated, 0 otherwise. 
Z 1éifazero result is generated, 0 otherwise. 
V1 if a floating-point is overflow occurs, 0 otherwise. 
Cc Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


44-142 Assembly Language Instructions 


7 _Parallel MPYF3 a and ? ADDF3 _MPYF3||ADDF3 


Syntax MPYFS3 srcA, srcB, dst1 
|| ADDF3 srcC, srcD, dst2 
Operation srcA x srcB — dst1 
| srcC + srcD > dst2 
Operands 
srcA 
srcB Any two must be indirect (disp = 0, 1, IRO, IR1), and 
srcC any two must be register (RO — R7) 
srcD 
dst7 register (d7): 
0 =RO 
1=Ri1 
dst2 register (a2): 
0=R2 
1=R3 
srct register (RO-—R7) 
src2 register (RO-—R7) 
src3 indirect (disp = 0, 1, IRO, IR1) 
src4 indirect (disp = 0, 1, IRO, IR1) 
P parallel addressing modes (0 < P < 3) 
Operation (P Field) 
00 src3 x src4, srcl + src2 
01 src3 x src1, src4 + src2 
10 src1 x src2, src3 + src4 
11 src3 x src1, src2 + src4 
Encoding 


87 0 


BC 28 da 


Description A floating-point multiplication and a floating-point addition are performed in 
parallel. All registers are read at the beginning and loaded at the end of the 
execute cycle. This means that if one of the parallel operations (MPYF3) ! 
reads from a register and the operation being performed in parallel (ADDF3) 
writes to the same register, then MPYF3 accepts as input the contents of 
the register before it is modified by the ADDFS. 
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MPY F3||[ADDF3 Parallel MPYF3 and ADDF3 


SS SS a 


Any ceribhiailon of addressing modes may be coded for the four possible 
source operands as long as two are coded as indirect and two are register. 
The assignment of the source operands srcA — srcDto the src? — src4fields 
varies, depending on the combination of addressing modes used; the P field 
is encoded accordingly. The assembler may, when not significant, change 
the order of operands in commutative operations in order to simplify pro- 
cessing. 


If src2 and dsi2 point to the same location, src2 is read pelote the write to 
dst2. 


Cycles 1 


Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF $1 if a floating-point underflow occurs, 0 otherwise. 
N : ; 

Z 0. 

V1 if a floating-point overflow occurs, 0 otherwise. 

Cc Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-144 | | Assembly Language Instructions 


Example 


Parallel MPYF3 and ADDF? | MPYF3||ADDFS 


a ta a Nn 


MPY¥RRS++ (1) 7%> = AR1 (IRQ) , RO 
Il ADDF3 R5,R7,R3 


Before Instruction: 


AR5 = 80 98C5h 

AR1 = 80 98A8h 

IRO = 4h 

RO = Oh 

R5 = 07 33C0 0000h = 1.79750e + 02 

R7 = 07 0C80 0000h = 1.4050e + 02 

R3 = Oh 

Data at 80 98C5h = 34C 0000h = 1.2750e + 01 
Data at 80 98A4h = 111 0000h = 2.2500e + 00 
LUF LV UF N ZV CeH=0 0000 0 0 


After Instruction: 


ARS5 = 80 98C6h 

AR1 = 80 98A4h 

IRO = 4h 

RO = 04 6718 0000h = 2.88867188e + 01 

R5 = 07 33C0 0000h = 1.79750e + 02 

R7 = 07 0C80 0000h = 1.4050e + 02 

R3 = 08 2020 0000h = 3.20250e + 02 

Data at 80 98C5h = 34C 0000h = 1.2750e + 01 
Data at 80 98A4h = 111 0000h = 2.2500e + 00 
LUF LV UF NZV C=0 0 00.0 0 0 
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Syntax 


Operation 


Operands 


Encoding 


eaten geeesatan esses aisetateeetarasenetet tesa se eeatetatatano reeset eteastaetesesanoatetatecoseatanonaesetssacetatstoteseesenateesssstentasesottetasetatoeresenatevetsatetatlstasaesasatenotetaeanattssisoteeasasnteeatates aenseeneane 


MPYF3 src2, src1, dst 
| STF src3, dst2 


sre! x src2 — dst? 
|| src3— dst2 


Src? register (RO — R7) 

Src2__ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 

src3__ register (RO — R7) | 

dst2 indirect (disp = 0, 1, IRO, IR1) 


15 


31 24 23 16 87 0 


Description 


Cycles 


Status Bits 


' Mode Bit 
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A floating-point multiplication and a floating-point store are performed in 
parallel. All registers are read at the beginning and loaded at the end of the 
execute cycle. This means that if one of the parallel operations (MPYF3) 
writes to a register and the operation being performed in parallel (STF) 
reads from the same register, then the STF accepts as input the contents 
of the register before it is modified by the MPYF3. 


If src2 and dsi2 point to the same location, then src2is read before the write 
to dsi2. 


1 


LUF 1 if a floating-point underflow occurs, 0 unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if anegative result is generated, 0 otherwise. 

Z  1ifazero result is generated, 0 otherwise. 

V1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


Parallel MPYF3 and STF MPYF3||STF [STF 


eonhnneseieicvcesnaspeneietnteletrniscpatasaceansniestce inne RAGA ALACRA RNAS SERGE APOE SEINE ARE ORE POET ssasbesasmaetemetapanteceiebeacsiscrnarceceancstusarte smueinnacanineint 


Example MPYF3 *-AR2(1),R7,RO0 
ll STFR3,*ARO- —(IRO) 


Before Instruction: 


AR2 = 80 982Bh 

R7 = 05 7B40 0000h = 6.281250e + 01 

RO = 0h 

R3 = 08 6B28 0000h = 4.7031250e + 02 

ARO = 80 9860h 

IRO = 8h 

Data at 80 982Ah = 70C8000h = 1.4050e + 02 
Data at 80 9860h = Oh 

LUF LV UF N ZV C=0 0 00 00 0 


After Instruction: 

AR2 = 80 982Bh 

R7 = 05 7B40 0000h = 6.281250e + 01 

RO = OD 09E4 A000h = 8.82515625e + 03 

R3 = 08 6B28 0000h = 4.7031250e + 02 

ARO = 80 9858h 

IRO = 8h 

Data at 80 982Ah = 70C 8000h = 1.4050e + 02 

Data at 80 9860h = 86B28 0000h = 4.7031250e + 02 
LUF LV UF NZ VCeH#0 000000 
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11 


MPYF3| |SUBF3 Parallel MPYF3 and SUBF3 _ 


Syntax MPYFS3 srcA, srcB, dst1 
|| SUBF3 srcC, srcD, dst2 
Operands 
srcA | 
srcB Any two must be indirect (disp = 0, 1, IRO, 1R1), and 
srcC any two must be register (RO — R7) | | 
srcD | 
Operation srcA x srcB — dst1 
| srcD-srcC > dst2- 
-dst1 register (d7): 
0 =RO 
1=RI1 
dst2 register (d2): 
0=R2 
1=R3 
src1 register (RO-—R7) 
src2 register (RO—R7) 
src3 indirect (disp = 0, 1, IRO, IR1) 
src4. indirect (disp =0, 1, IRO, IR1) 
P parallel addressing modes (0 < P < 3) 
Operation (P Field) 
00 src3 x src4, src1 — src2 
01 src3 x src1, src4 — src2 
10 src? x src2, src3 — src4 
11 src3 x src, stc2 — src4 
cee 


87 


0 
BC 27) 28 bd 


Description A floating-point multiplication and a floating-point subtraction are performed 
in parallel. All registers are read at the beginning and loaded at the end of 
the execute cycle. This means that if one of the parallel operations (MPYFS3) 
reads from a register, and the operation being performed in parallel 
(SUBF3) writes to the same register, then MPYF3 accepts as input the con- 
tents of the register before it is modified by the SUBF3. 


11-148 — 


Assembly Language Instructions 


Cycles 
Status Bits 


Mode Bit 


2A 


Sa a SAE XK LS SI SS SCD 


eee 
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Any combination of addressing modes may be coded for the four possible 
source operands as long as two are coded as indirect and two are register. 
The assignment of the source operands srcA — srcDto the src1 — src4fields 
varies, depending onthe combination of addressing modes used; the P field 
is encoded accordingly. The assembler may, when not significant, change 
the order of operands in commutative operations in order to simplify pro- 
cessing. 


1 


LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 0. 

Z 0. 

V1 if a floating-point overflow occurs, 0 otherwise. 

Cc Unaffected. 


OVM Operation is not affected by OVM bit value. 
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-MPYF3 | [SU BF3 Parallel MPYFS at and SUBF3- 


Example MPYF3 R5, *++AR7 (TRL) ,RO 
ll SUBF3R7,*AR3- -—(1),R2 


MPYF3 *++AR7(IR1), R5,R0 
ll SUBF3R7,*AR3- -—(1),R2 


Before Instruction: 


R5 = 03 4C00 0000h = 1.2750e + 01 

AR7 = 80 9904h 

IR1 = 8h 

RO = Oh 

R7 = 07 33C0 0000h = 1.79750e + 02 

AR3 = 80 98B2h 

R2 = Oh 

Data at 80 990Ch = 111 0000h = 2.250e + 00 

Data at 80 98B2h = 70C 8000h = 1.4050e + 02 
LUF LV UF NZ VC#0 0000 00 


After Instruction: 


R5 = 03 4C00 0000h = 1.2750e + 01 

AR7 = 80 990Ch 

IR1 =8h 

RO = 04 6718 0000h = 2.88867188e + 01 

R7 = 07 33C0 0000h = 1.79750e + 02 
AR3=8098Bih 

R2 = 05 E300 0000h = — 3.9250e + 01 

Data at 80 990Ch = 111 0000h = 2.250e + 00 

Data at 80 98B2h = 70C 8000h = 1.4050e + 02 — 
LUF LV UENZVC=0 000000 
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Multiply Integer MPYI 


Syntax 


Operation 


Operands 


Encoding 


31 24 23 16 15 
Bed Eine I a : 


MPYI src, dst 
dst x src > dst 


src general addressing modes (G): 
00 register (any CPU register) 
01 direct 
10 indirect 
11 immediate 


dst register (any register in CPU primary register file) 


87 0 


Description The product of the dst and src operands is loaded into the dst register. The 


Cycles 
Status Bits 


Mode Bit 
Example 


srcand dstoperands, when read, are assumed to be 32-bit signed integers. 
The result is assumed to be a 64-bit signed integer. The output to the dst 
register is the 32 least-significant bits of the result. 


Integer overflow occurs when any of the most significant 32 bits of the 64-bit 
result differs from the most significant bit of the 32-bit output value. 


1 


lf ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST ee COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unchanged. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if azero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is affected by OVM bit value. 
MPYI R1,R5 

Before Instruction: 

R1 = 00 0033 C251h = 3,392,081 

R5 = 00 0078 B600h = 7,910,912 

LUF LV UF NZVC=#=0 000000 
After Instruction: 

R1 = 00 0033 C25th = 3,392,081 

R5 = 00 E21D 9600h =— 501,377,536 

LUF LV UF NZ VCz=0 101010 
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MPYI3 Multiply Integer, 3 Operands 


POPS SEA EUSA PMOL a neeenentat lal a aeebeta al ehsbsieh ote Sek ahaha haat vetat al etait eA a Net ALBA Ac sia aca alate calalaseacatacacalalaal a eel e Salsa sale le Tale SiLe SACP AACA AA ALO Nee A AIA ALOR ADR IPCAS ISIS MON NEN Aetanytatehy tell tat nasil olghi/ 
Fo ge en te nn tee se a not OR Dia ee tea tat at et NN a ha aa ca ee a on at ccd neon Ma a et et eh a do La a ie ta na aa neta de aD 


atetetatetes 


Syntax MPYI3 src2, src1, dst 
Operation src1 x src2 > dst 
Operands srci,src2_ bothtype 1 or type 2 three-operand addressing modes 


dst register mode (any register in CPU primary register file) 
Encoding 
Type 1 
31 24 23 16 15 87 0 
Type 2 
31 24 23 16 15 87 0 


Instruction Word Fields 
src1 addressing modes | src2 addressing modes 

register mode (any CPU register) | register mode (any CPU register) 
Type 1 | 01 | indirect mode (disp = 0, 1, IRO,IR1) _| register mode (any CPU register) 

10 | register mode (any CPU register) © indirect mode (disp = 0, 1, IRO, IR1) | 
indirect mode (disp = 0,1, 1R0,1R1) —_| indirect mode (disp = 0, 1, IRO, IR1) 
srct addressing modes src2 addressing modes —] 
| 00 | register mode (any CPU register) 8-bit signed immediate | 


| indirect mode *+ARn(5-bit unsigned 
register mode (any CPU register) displacement) - io pirunie’ 
indirect mode *+ARn(5-bit unsigned te 3 , ; 
displacement) i pone 8-bit signed immediate 
indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) _ displacement) 


Description The product of the numbers at src? and src2is loaded into the dst register. 
The multiplied numbers are assumed to be 32-bit signed integers. The re- 
sult is assumed to be a signed 64-bit integer. The output to the dst register 
is the 32 least significant bits of the result. 7 


| 


Type 2 


Cycles | 
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Multiply Intege 
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Status Bits \|f ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unchanged. 

LV 1 if aninteger overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if anegative result is generated, 0 otherwise. 

Z 1 éifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 


Mode Bit OVM Operation is affected by OVM bit value. 


Note Integer overflow occurs when any of the most significant 32 bits of the 64-bit 
result differs from the most significant bit of the 32-bit output value. 
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MPYI3| |ADDI3 Parallel MPYIZandADDIS— 
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Syntax MPYIS3 srcA, srcB, dst1 
|| ADDI3 srcC, srcD, dst2 
Operation srcA x srcB — dst1 
|| srcD+ srcC > dst2 
Operands 
srcA | 
srcB Any two must be indirect (disp = 0, 1, IRO, IR1), and 
srcC any two must be register (RO — R7) 
srcD 
dst register (d7): 
0 =RO 
1=R1 
dst2 register (a2): 
0 =R2 
1=R3 
src register (RO-—R7) 
src2 register (RO—R7) 
src3 indirect (disp =0, 1, IRO, IR1) 
src4 indirect (disp =0, 1, IRO, IR1) 
P parallel addressing modes (0 < P < 3) 
Operation (P Field) 
00 src3 x src4, srcl + src2 
01 src3 x src1, src4 + src2 
10 src? x src2, src3 + src4 
11 src3 x src1, src2 + src4 
rata _ 4 
24 23 87 0 


RO OEC2 aos re 


Description An integer multiplication and an integer addition are performed in parallel. 
11 All registers are read at the beginning and loaded at the end of the execute 
) cycle. This means that if one of the parallel operations (MPYI3) reads from 
a register and the operation being performed in parallel (ADDIS) writes to 
the same register, then MPYI3 accepts as input the contents of the register 

before it is modified by the ADDI3. 
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eeestatanatatatecestseeenececes ores t5e5e00 00502028. 0.05050, 0.0,0.000,02096860, 


Cycles 
Status Bits 


Mode Bit 


Example 


CS ae SCS. oN LO SN NR RN a 


Any combination of addressing modes may be coded for the four possible 
source operands as long as two are coded as indirect and two are register. 
The assignment of the source operands srcA — srcDto the src1 — src4fields 
varies, depending on the combination of addressing modes used; the P field 
is encoded accordingly. The assembler may, when not significant, change 
the order of operands in commutative operations in order to simplify pro- 
cessing. 


1 


LUF Unchanged. 

LV 1 if aninteger overflow occurs, unchanged otherwise. 
UF 0. 

N 0. 

Z  O.z | 

V1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is affected by OVM bit value. 


MPYI3 R7,R4,RO0 
ll ADDI3*-AR3,*AR5-— -(1),R3 


Before Instruction: 


R7 = 14h = 20 

R4 = 64h = 100 

RO = Oh 

ARS = 80 981Fh 

ARS5 = 80 996Eh 

R3 = Oh 

Data at 80 981Eh = OFFFF FFCBh =— 53 

Data at 80 996Eh = 35h = 53 

LUF LV UF NZ V C=0 00000 0 


After Instruction: 


R7 = 14h = 20 

R4 = 64h = 100 

RO = 07D0h = 2000 

AR3 = 80 981Fh 

AR5 = 80 996Dh 

R3 = Oh 

Data at 80 981Eh = OFFFF FFCBh = — 53 

Data at 80 996Eh = 35h = 53 

LUF LV UF NZ V C=0 000000 
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MPYI3||ST 
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Syntax 


Operation 


Operands 


Encoding 


Description 


Cycles 
Status Bits 


Mode Bit 
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31 24 23 1615 87 0 


| Parallel MPYI3 and STI3 


MPYI3 src2, src1, dst1 
| STI src3, dst2 


src1 x src2 — dst? 
| src3— dst2 


src7 register (RO — R7) 

Src2__ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 

src3__ register (RO — R7) 3 

dsi2 indirect (disp = 0, 1, IRO, IR1) 


An integer multiplication and an integer store are performed in parallel. All 
registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STI) reads from a 
register and the operation being performed in parallel (MPYI3) writes to the 
same register, then STI accepts as input the contents of the register before 
it is modified by the MPYI3. 


lf src2 and dst2 point to the same location, src2 is read before the write to 
dst2. 


Integer overflow occurs when any of the most significant 16 bits of the 48-bit 
result differs from the most significant bit of the 32-bit output value. 


1 


LUF Unchanged. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z  1/ifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is affected by OVM bit value. 


Assembly Language Instructions 
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Example 


MPYI3 *++ARO(1),R5,R7 
ll STI R2,*-AR3 (1) 


Before Instruction: 


ARO = 80 995Ah 

R5 = 32h = 50 

R7 = Oh 

R2 = ODCh = 220 

AR3 = 80 982Fh 

Data at 80 995Bh = 0C8h = 200 

Data at 80 982Eh = Oh 

LUF LV UF N ZV Cz=0 0000 0 0 


After Instruction: 


ARO = 80 995Bh 

R5 = 32h = 50 

R7 = 2710h = 10000 

R2 = ODCh = 220 

AR3 = 80 982Fh 

Data at 80 995Bh = O0C8h = 200 

Data at 80 982Eh = ODCh = 220 

LUF LV UF NZVC=0 0000 0 0 


Parallel MPYI3 and STI3 


_MPYI3||STI 


FR oN LE Ba aad 
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11 


31 24 23 16 15 87 


MPYI3| |SUBI3- Parallel MPYI3: and SUBI3 
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Syntax MPYI3 srcA, srcB, dst? 
| || SUBI3 srcC, srcD, dst2 
Operation srcA x srcB — dst 
|| srcD-—srcC > dst2 
Operands | 
srcA 
srcB Any two must be indirect (disp = 0, 1, IRO, IR1), and 
srcC any two must be register (RO — R7) 
srcD 
dst1 register (d7): 
0=RO0 
1=R1 
dst2 register (d2): 
O0=R2 
1=R3 
src register (RO—R7) 
src2 register (RO-—R7) 
src3 indirect (disp =0, 1, IRO, IR1) 
src4. indirect (disp =0, 1, IRO, IR1) 
P parallel addressing modes (0 < P < 3) 
Operation (P Field) 
00 src3 x src4, src1 — src2 
01 src3 x src1, src4 — src2 
10 src1 x src2, src3 — src4 
11 src3 x src1, src2 — src4 
Encoding 


0 


Description An integer multiplication and an integer subtraction are performed in paral- 
lel. All registers are read at the beginning and loaded at the end of the ex- 
ecute cycle. This means that if one of the parallel operations (MPYI3) reads 
from a register and the operation being performed in parallel (SUBIS3) writes 
to the same register, then MPYI3 accepts as input the contents of the regis- 
ter before it is modified by the SUBIS3. | 
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a ateatetah 


Parallel MPYI3 and 
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Any combination of addressing modes may be coded for the four possible 
source operands as long as two are coded as indirect and two are register. 
The assignment of the source operands srcA — srcDtothe src? — src4fields 
varies, depending on the combination of addressing modes used; the P field 
is encoded accordingly. The assembler may, when not significant, change 
the order of operands in commutative operations in order to simplify pro- 
cessing. 


Integer overflow occurs when any of the most significant 16 bits of the 48-bit 
result differs from the most significant bit of the 32-bit output value. 


Cycles | 


Status Bits LUF Unchanged. 
LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 1 if an integer underflow occurs, 0 otherwise. 
N 0. 
Z 0. 
V1 if an integer overflow occurs, 0 otherwise. 
C Unaffected. 


Mode Bit OVM Operation is affected by OVM bit value. 
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MPYI3||SUBI3 Parallel MPYI3 and SUBI3 
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Example 
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MPYI3 R2,*++ARO(1),RO0 
ll SUBI3AR5- - (IR1),R4,R2 
or 
- MPYI3 *++AR0(1),R2,RO 
ll SUBI3AR5- - (IR1),R4,R2 


R2 = 32h = 50 

ARO = 80 98E3h 

RO = 0h 

AR5 = 80 99FCh 

IR1 =OCh 

R4 = 07D0h = 2000 

Data at 80 98E4h = 62h = 98 

Data at 80 99FCh = 4BOh = 1200 

LUF LV UF NZVC=#0 000000 


R2 = 320h = 800 

ARO = 80 98E4h 

RO = 01324h = 4900 

AR5 = 80 99F0h 

IR1=0Ch 

R4 = 07D0h = 2000 

Data at 80 98E4h = 62h = 98 

Data at 80 99FCh = 4B0h = 1200 | 

LUF LV UF N ZV CH#0 000000 
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Multiply Signed Integer and Produce 32 MSBs MPYSHI 
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Syntax MPYSHI src, dst 
Operation dstx src— dst 
Operands src _ general addressing modes 
dst register mode (any register in CPU primary register file) 


Encoding 
87 


31 24 23 16 15 


Instruction Word Fields 


(S| sreaddressing modes 
00 register mode (any CPU esision 


Description The 32 MSBs of the product of the numbers at dst and src are loaded into 
the dst register. These numbers, when read, are assumed to be signed 
32-bit integers. The result is assumed to be a signed 64-bit integer. The 
output to the dst register is the 32 most significant bits of the result. 


Cycles 1 


Status Bits |f ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unchanged. 

LV Unchanged. 

UF 0. 

N 1 if anegative result is generated, 0 otherwise. 
Z 1 if all 64 bits of the product are 0, 0 otherwise. 
VO. 

C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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MPYSHI3 Multiply Signed Integer Producing 32 MSBs, 3 Operands . 
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Syntax MPYSHI3 src2, src1, dst 
Operation src1 x src2 > dst 


Operands src1_ type 1 or type 2 three-operand addressing modes 
src2 type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 


Encoding 

Type 1 

31 24 23 16 15 8 7 0 
Type 2 

31 24 23 | 1615 8 7 | 0 


Instruction Word Fields 


src? addressing modes src2 addressing modes 
00 | register mode (any CPU register) register mode (any CPU register) 
indirect mode (disp = 0, 1, IRO, IR1) register mode (any CPU register) 


register mode (any CPU register) 
disp = 0, 1, IRO, IR1) 


indirect mode 


indirect mode (disp = 0, 1, IRO, IR1) 


indirect mode 


src? addressing modes src2 addressing modes 
register mode (any CPU register) 8-bit signed immediate 


register mode (any CPU register) ati oats +ARn(S-bit unsigned 
indirect mode *+ARn(5-bit unsigned aes oe 
displacement) 8-bit signed immediate 


44 indirect mode *+ARn1(5-bit unsigned indirect mode *+ARn2(5-bit unsigned 
displacement) displacement) 


Description The product of the numbers at the src? and src2 operands is loaded into the 

| dst register. The numbers at the src? and src2 operands are assumed to 
be 32-bit signed integers. The result is assumed to be a signed 64-bit inte- 

ger. The output to the dstregister is the 32 most significant bits of the result. 
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Cycles 1 


Status Bits _|f ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unchanged. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 if azero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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MPYUHI Multiply Unsigned Integer and Produce 32 MSBs 
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Syntax MPYUHI src, dst 
Operation § dstx src dst 
Operands src general addressing modes 
dst register mode (any register in CPU primary register file) 


Encoding 
31 24 23 16 15 87 


Instruction Word Fields 


|G | sre addressing modes _ 
00 | bs rial mode (any CPU register) 


indirect moda. 


Description The 32 MSBs of the product of the numbers at dst and src operands are 
loaded into the dst register. These numbers, when read, are assumed to 
be unsigned 32-bit integers. The result is assumed to be an unsigned 64-bit 


integer. The output to the via register is the 32 most significant bits of the 
result. 


jo 


Cycles 1 


Status Bits |f ST (SET COND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SET COND) = 1, they are modified for all destina- 
tion registers. 

LUF Unchanged. 
LV Unchanged. 


UF 0. 
N 0. 
Z 1 if all 64 bits of the product are 0, 0 otherwise. 
VO. 


C Unaffected. 
Mode Bit OVM Operation is not affected by OVM bit value. 
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Syntax MPYUHI3 src2, src1, dst 
Operation —src1 x src2 > dst 


Operands srci,src2 both type 1 or type 2 three-operand addressing modes 


dst register mode (any register in CPU primary register file) 
Encoding 
Type 1 
31 24 23 16 15 87 0 
Type 2 
31 24 23 16 15 87 0 


Instruction Word Fields 


~ sre? addressing modes | src2 addressing modes 
| 00° register mode (any CPU register) 8-bit signed immediate 


| —— Se indirect mode *+ARn(5-bit unsigned 
register mode (any CPU register) displacement) 
indirect mode *+ARn(5-bit unsigned eg ; , 
displacement) bs ( g 8-bit signed immediate | 
1 indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) displacement) 


Description The product of the numbers at the src? and src2 operands is loaded into the 
dstregister. The numbers at the src7 and src2 operands are assumed to be 


32-bit signed integers. The result is assumed to be an unsigned 64-bit inte- 
ger. The output to the dstregister is the 32 most significant bits of the result. ™™" 


Cycles 1 
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“Status Bits \f ST (SET COND) =0 and the destination register is RO— R11, the condition 


Mode Bit 


11-166 


flags are modified. If ST (SET ia a 1, they are modified for all destina- 


‘tion registers. 


LUF Unchanged. 
LV Unchanged. 


UF 0. 
N 0. 
Z 1 if all 64 bits of the product are 0, 0 otherwise. 
VO. 


C Unaffected. 
OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


Syntax 


Operation 


Operands 


Encoding 


31 24 23 16 15 
BOS Brie ie mea : 


a Negative Integer With Borrow NEGB 


NEGB src, dst 
0-src-—C — ast 


sre general addressing modes (G): 
00 _ register (any register in CPU primary register file) 
01 direct 
10 indirect 
11 immediate 


dst register (any register in CPU primary register file) 


87 0 


Description The difference of the 0, src, and C operands, calculated as shown, is loaded 


Cycles 
Status Bits 


Mode Bit 


Example 


into the dst register. The dst and src are assumed to be signed integers. 
1 


lf ST (SETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are moditied for all destina- 
tion registers. 

LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if anegative result is generated, 0 otherwise. 

Z 1 ifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C 1 if a borrow occurs, 0 otherwise. 


OVM Operation is affected by OVM bit value. 
NEGB R5,R7 
Before Instruction: 


R5 = OFFFF FFCBh =— 53 
R7 =0Oh 
LUF LV UF NZV C=0 0 00 0 0 1 


After Instruction: 


R5 = OFFFF FFCBh = — 53 
R7 = 34h = 52 
LUF LV UF NZVC=0 0000 0 1 


11-167 


eae Neoate Floating Pot VAG a. 


fi satvatiteAMBUtetPatate tate lenee tates teUsetete ietotatonelateetuteteutateutateutatoteetontatatetbatettusatoteaatetuteneeSotsneeesaaeventoeeateassaese tnitaseesesesoesnsotsvante cnsoetetotetesittatatenetaneetanentutetletittetiealtetetousstetetalatii Manon eGtinetscereasstetete 


Syntax NEGF sre, dst 
Operation (Q-—src— dst 


Operands src general addressing modes (G): 
00 register (RO — R11) 
01 direct 
10 — indirect 
11 immediate 


| dst register (RO — R11) 
Encoding 
31 | 15 87 


| 2423 16 0 
oops) @ fe 
Description The difference of the 0 and src operands is loaded into the dstregister. The 
dst and src operands are assumed to be floating-point numbers. 


Cycles 1 


Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 

| 1 if a negative result is generated, 0 otherwise. 

1 if a zero result is generated, 0 otherwise. 

1 if a floating-point overflow occurs, 0 otherwise. 

Unaffected. 


QO<NZ 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example NEGF *++AR3(2),R1 


Before Instruction: 


AR3 = 80 9800h 

R1 = 05 7B40 0025h = 6.28125006e + 01 

Data at 80 9802h = 70C 8000h = 1.4050e + 02 
LUF LV UF NZV C=0 000000 


After Instruction: 


AR3 = 80 9802h 

R1 = 07 F380 0000h = —1.4050e + 02 

Data at 80 9802h = 70C 8000h = 1.4050e + 02 
LUF LV UF NZV C=0 0000 0 0 


11-168 | | Assembly Language Instructions 


Parallel NEGF and STF NEG F| [STF 


Syntax NEGF src2, dst? 
|| STF src3, dst2 

Operation 0 — src2 = dst1 
| src3— dst2 


Operands src2__ indirect (disp = 0, 1, IRO, IR1) 
dst! register (RO — R7) 
src3__register (RO — R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


Encoding 
87 0 


31 24 23 16 15 
Peele pepe, ae pe 


Description A floating-point negation and a floating-point store are performed in parallel. 
All registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STF) reads from a 
register and the operation being performed in parallel (NEGF) writes to the 
same register, then STF accepts as input the contents of the fegisistP before 
it is modified by the NEGF. 


If src2 and dst2 point to the same location, src2 is read before the write to 
ast2. 


Cycles 1 


Status Bits LUF 1 if a floating-point underflow occurs, 0 unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 
N 1 if anegative result is generated, 0 otherwise. 
Z 1 éifazero result is generated, 0 otherwise. 
V1 if a floating-point overflow occurs, 0 otherwise. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-169 


NEGF||STF Parallel NEFG and STF 


ASS Satanat artetetesatetetesataratatetatonogaratoteeatonatorsnvntetoteherevetenetsferarahetesonentton a SS a Sd 


Example NEGF*AR4— -—(1),R7 
ll STF R2,*++AR5 (1) 


fore Instructi 


AR4 = 80 98E1h 

R7 = Oh 

R2 = 07 33C0 0000h = 1.79750e + 02 

AR5 = 80 9803h 

Data at 80 98E1h = 57 B40 0000h = 6.281250e + 01 
Data at 80 9804h = Oh 

LUF LV UF N ZV C=#=0 0 000 0 0 


After Instruction: 


AR4 = 80 98E0h 

R7 = 05 84C0 0000h = — 6.281250e + 01 

R2 = 07 33C0 0000h = 1.79750e + 02 

AR5 = 80 9804h 

Data at 80 98E1h = 57B 4000h = 6.281250e + 01 
Data at 80 9804h = 733 CO00h = 1.79750e + 02 
LUF LV UF N ZV Cz=#0 0 00 0 0 0 


11-170 Assembly Language Instructions 


Negate Integer 


SSSIESOLR ES AOSD ILS SSE ALIS SSIS SOSA SS OSU S NEAR MN a NEUEN DIO UBASSSAON Nd MaNTSNEanO OME MNWENsosonololiatsdananotontoSasetstetetetelsa‘atetonslelatatetetstetelatatoneteliteuseteteteretonoteitatvGutesenenienOnusnsoSansennmranetetatusnssneneeobeiletitosonetetolurcitentonatenatatette! titatalatahetetebatale 


Syntax NEGI src, dst 
Operation (Q-src— dst 


Operands _ srcgeneral addressing modes (G): 
00 register (any register in CPU primary register file) 
01 direct 
10 indirect 
11 immediate 


dst register (any register in CPU primary register file) 


Encoding 
31 15 87 0 


24 23 16 


Description The difference of the 0 and src operands is loaded into the dst register. The 
dst and src operands are assumed to be signed integers. 


Cycles 1 


Status Bits |f ST (SGETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 3 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if anegative result is generated, 0 otherwise. 

Z 1 /ifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C 1 ifaborrow occurs, 0 otherwise. 


Mode Bit OVM Operation is affected by OVM bit value. 
Example NEGI 174,R5 (174 = OAEh) 
Before Instruction: 


R5 = ODCh = 220 
LUF LV UF NZVC=0 000000 


After Instruction: 


R5 = OFFFFFF52 = -174 
LUF LV UF N ZV C=0 0 010 0 1 


11-171 


NEGI||STI Parallel NEGI and STI 


SESSLER IMAI MN NNN NAN a NNN NEA NN MNEAENeO RENN A SMAEMDM co ase Nene eSunommnmnDeseoesaesanaeeanaatetotsateesenaetattaneseattessssatsessesatevsnteasaesateasoreanestucsateesatenseencaasetsataeatostassaseeeiotatnesatsateesoneateesassateasotsateasntecsotsateneanesetessuteaeanaaeateataeeatesesnetateseaeeesatssSacnasaesanteta een ssee hat emeee 


Syntax NEGI src2, dst1 
| STI src3, dst2 
Operation Q—src2— dst! 
|| src3 — dst2 


Operands — src2__ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src3 register (RO — R7) 
dsit2 indirect (disp = 0, 1, IRO, IR1) 


Encoding 
31 87 0 


24 23 16 15 


Description An integer negation and an integer store are performed in parallel. All regis- 
ters are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STI) reads from a 
register and the operation being performed in parallel (NEGI) writes to the 
same register, then STI accepts as input the contents of the register before 
it is modified by the NEGI. 


If src2 and dsi2 point to the same location, src2 is read before the write to 
dst2. | 


Cycles | 


Status Bits LUF Unaffected. 
LV tif an integer overflow occurs, unchanged otherwise. 
UF 0. 
N 1 if anegative result is generated, 0 otherwise. 
Z 1ifazero result is generated, 0 otherwise. 
V___‘1if an integer overflow occurs, 0 otherwise. 
C 1 if a borrow occurs, 0 otherwise. 


Mode Bit OVM Operation is affected by OVM bit value. 


11-172 | Assembly Language Instructions 


Example 


_ Parallel NEGI and ST! NEGI||STI 


NEGI *-AR3,R2 
ll STI R2,*AR1++ 


Before Instruction: 


AR3 = 80 982Fh. 

R2 = 19h = 25 

AR1 = 80 98A5h 

Data at 80 982Eh = ODCh = 220 

Data at 80 98A5h = Oh 

LUF LV UF N ZV C=#0 00000 0 


AR3 = 80 982Fh 

R2 = OFFFF FF24h = —.220 
AR1=8098A6h 

Data at 80 982Eh = ODCh = 220 


Data at 80 98A5h = 19h = 25 
LUF LV UF NZVC=0 001001 


11-173 


NOP No Operation 


SHUSENIGLE AIS SSOO LNA SSD SLOSS ANSI TSN IIIT Mee ane ade ea cea ele ee ae eae eae eee ee ee nna SCS Se SS see a ee SR SS eso aS Sd 


Syntax NOP src 


Operation No ALU or multiplier operations. 


ARn is modified if src is specified in indirect mode. 
Operands src general addressing modes (G): 
00 _ register (no operation) 
10 indirect (modify ARn, 0 <n < 7) 


en 
87 | 0 


ney CE CNTES src 


‘Description \fthe srcoperand is specified in.the indirect mode, the specified addressing 
operation is performed and a dummy memory read occurs. If the src oper- 
and is omitted, no operation is performed. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


Mode Bit | OVM Operation is not affected by OVM bit value. 
Example NOP 

Before Instruction: 

PC = 3Ah 

After Instruction: 

PC = 3Bh 
Example NOP *AR3- -(1) 

Before Instruction: 


PC = 5h 
AR3 = 80 9900h 


After Instruction: 


PC = 6h 
AR3 = 80 98FFh 


41-174 Assembly Language Instructions 


Normalize NORM 


SSSR SISOS RRS Ci etna astdaeenrnnmonenaaia ABNER SRR EEE SNCS CETTE CTT TN ECTS R TT Soninpunsunninnseensainaseependatensomennanttattn 


Syntax NORM sre, dst 
Operation norm (src) > dst 


Operands src general addressing modes (G): 
00 register (RO — R11) 
01 direct 
10 _ indirect 
11 = immediate 


Encoding | | 
87 0 


31 24 23 16 15 
eee erdel a : 


coeenton The src operand is assumed to be an unnormalized floating: point number; 
i.e., the implied bit is set equal to the sign bit. The dst is set equal to the nor- 
malized src operand with the implied bit removed. The dst operand expo- 
nentis set to the srcoperand exponent minus the size of the left shift neces- 
sary to normalize the src. The dst operand is assumed to be a normalized 
floating-point number. 


For values of src: 

If sre (exp) = -128 and src (man) = 0, then dst= 0, Z = 1, and UF = 0. 

QO if sre (exp) = -128 and src (man) + 0, then dst = 0, Z = 0, and UF = 1. 

(3 For all other cases of the src, if a floating-point underflow occurs, then 
dst (man) is forced to 0 and dst (exp) = —128. If src (man) = 0, then 
dst (man) = 0 and dst (exp) = —128. Refer to Section 4.7 on page 4-24. 


Cycles 1 


Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV Unaffected. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 
N 1 if a negative result is generated, 0 otherwise. 
Z  1ifazero result is generated, 0 otherwise. 
V0. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-175 


abcd Normalize | 


Example NORM R1 ,R2 
Before instruction: 


R1 = 040000 3AF5h 
R2 = 07 0C80 0000h 
LUF LV UF NZVC=0 000000 


After Instruction: 


R1 = 04 0000 3AF5h 
R2 = F2 6BD4 0000h = 1.12451613e — 04 
LUF LV UF NZVC=0 000000 


11-176 , | Assembly Language Instructions 


Syntax 


Operation 


Operands 


Encoding 


31 24 23 16 15 
coer el a : 


Bitwise Logical Complement NOT 


ASA ASN aSe Ana RAR aR te A ALA A AAA RAR SIARAALSTA A Aipatee natant aaa alana aa aaa nate ate a AAA AAA aan SAL SARITA APIO RCN EPSP? AD BRSERE RESEDA RRA NNR x 


NOT src, dst 
~src — dst 


src general addressing modes (G): 
00 _ register (any register in CPU primary register file) 
01 direct | 
10 _ indirect 
11 immediate 


dst register (any register in CPU primary register file) 


87 0 


Description The bitwise logical complement of the srcoperand is loaded into the dstreg- 


Cycles 
Status Bits 


Mode Bit 


Example 


ister. The complement is formed by a logical NOT of each bit of the srcoper- 
and. The dst and src operands are assumed to be unsigned integers. 


1 


If ST (SGETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N MSB of the output. 

Z 1 ifazero result is generated, 0 otherwise. 

VO. 

C Unaffected. 


OVM Operation is affected by OVM bit value. 


NOT @982Ch,R4 


Before Instruction: 

DP = 80h 

R4 = Oh 

Data at 80 982Ch = 5E2Fh 

LUF LV UF NZVC=0 0 00000 


After Instruction: 
DP = 80h 
R4 = OFFFF A1D0h 


Data at 80 982Ch = 5E2Fh | 
LUF LV UF N ZV C=0 0010 0 0 


11-177 


NOT||STI Parallel NOT and STI 


a eae SS SS acasaeceeatenesesatarsesesateegesetseteteetetateteseseseesatesetetateneyeseteteceetetatoeeteaesatetesetetebeesetesssncoseesatetetseseseeeayenstetasstotencatootoesstogeesonetetetatetertatatetettssatetentonoteteenatenomsanenecosenanatogeerenetoetsacenceosnnseeetenatatenenetesaosateseeteanteteeconntensesseatsasseteseteesete 


Syntax 


Operation 


Operands 


Encoding 
31 


Description 


Cycles 


Status Bits 


Mode Bit 


11-178 


NOT src2, dst? 
|| STI src3, dst2 


~src2 — dst1 
| src3— dst2 


src2__ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src3_ register (RO — R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


15 


24 23 ___16 87 0 


A bitwise logical NOT and an integer store are performed in parallel. All reg- 
isters are read at the beginning and loaded at the end of the execute cycle. 
This means that if one of the parallel operations (STI) reads from a register 
and the operation being performed in parallel (NOT) writes to the same reg- 
ister, then STI accepts as input the contents of the register before it is modi- 
fied by the NOT. 


If src2 and dst2 point to the same location, src2 is read before the write to 
dsi2. 


1 


LUF Unaffected. 

LV Unaffected. 

UF 0. 

N  MSB of the output. 


Z  14ifazero result is generated, 0 otherwise. 
VO. : 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


Example 


Parallel NOT and STI _NOT| [STI 


Fe oi oy SEE TERE COE LORS TRAE SO SRO SSE TOR RO TN RRS i 


NOT  *+AR2,R3 | 
| STI  R7,*- -AR4 (IR1) 


Before Instruction: 


AR2 = 80 99CBh 

R3 = Oh 

R7 = O0DCh = 220 

AR4 = 80 9850h 

IR1 = 10h 

Data at 80 99CCh = OC2Fh 

Data at 80 9840h = Oh 

LUF LV UF N ZV Cz#0 000000 


After Instruction: 
AR2 = 80 99CBh 
R3 = OFFFF F3D0h 
R7 = 0DCh = 220 
AR4 = 80 9840h 
IR1 = 10h 


Data at 80 99CCh = OC2Fh 


Data at 80 9840h = ODCh = 220 
LUF LV UF N ZV Cz#0 0010 0 0 


11-179 


OR Bitwise Logical OR ee 


Syntax 
Operation 


Operands 


| eee 


OR src, dst 
dst OR src > dst 


sre general addressing modes (G): 
00 _ register (any register in CPU primary register file) 
01 direct 
10 _ indirect 
11 immediate (not sign-extended) 


dst peeste (any register in CPU primary register file) 


24 23 87 Q 


Description The bitwise logical OR between the src and dstopreands is loaded into the 


Cycles 
Status Bits 


Mode Bit 
Example 


11-180 


dstregister. The dstand srcoperands are assumed to be unsigned integers. 
1 


lf ST (SETCOND) = 0 and the destination register is RO —R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N MSB of the output. 

Z 1 éifazero result is generated, 0 otherwise. | 

VO.z 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
OR *++AR1 (IR1),R2 


Before Instruction: 

AR1 = 80 9800h 

IR1 = 4h 

R2 = 01256 0000h 

Data at 80 9804h = 2BCDh 

LUF LV UF N Z V Cz=0 0 000 0 0 


After Instruction: 
AR1 = 80 9804h 


~1IR1=4h 


R2 = 01256 2BCDh 
Data at 80 9804h = 2BCDh 
LUF LV UF N ZV Cz#0 0000 00 


Assembly Language Instructions _ 


Bitwise Logical OR, 3 Operands ORS 


Syntax OR3 src2, src1, dst 


Operation src1| src2 > dst (| = OR) 
Operands srci,src2 both type 1 or type 2 three-operand addressing modes 
dst register mode (any register in CPU primary register file) 
Encoding 
Type 1 
31 24 23 16 15 87 0 
Type 2 
31 24 23 16 15 87 0 


00 | register mode (any CPU register) register mode (any CPU register) 
indirect mode (disp = 0, 1, IRO, IR1) register mode (any CPU register) 
register mode (any CPU register) indirect mode (disp = 0, 1, IRO, IR1) 


indirect mode (disp = 0, 1, IRO, IR1) indirect mode (disp = 0, 1, IRO, IR1) 


y 
9 


11-181 


— Bitwise Logical OR, 3 Operands 


Description The bitwise logical OR between the numbers at the src7 and src2 operands 


Cycles 
Status Bits 


Mode Bit 


11-182 


-is loaded into the dst register. The numbers at the src7, src2, and Gener 


ands are assumed to be unsigned integers. 
1 


lf ST (SETCOND) =0, the condition flags are modified if the destination reg- 
ister is RO—R11. lf ST (SETCOND) = 1, they are modified for all destination 
registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N  MSB of the output. 

Z  1ifazero result is generated, 0 otherwise. 

VO. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


_Parallel OR3 and STI OR3||STI 


SSS UNA asp cncetecnialeecacnacaeaias aeseceeela aca 


Syntax OR3 src2, src1, dst 
| STI src3, dst2 

Operation src1 OR src2 > dstt 
| src3— dsi2 


Operands _ src1 register (RO — R7) 
src2 indirect (disp = 0, 1, IRO, IR1) 
dst7 register (RO — R7) 
src3 register (RO — R7) 
dsi2 indirect (disp = 0, 1, IRO, IR1) 
_ Encoding 
| 87 0 


31 24 23 16 15 


A bitwise logical OR and an integer store are performed in parallel. All regis- 
ters are read at the beginning and loaded at the end of the execute cycle. 
This means that if one of the parallel operations (STI) reads from a register 
and the operation being performed in parallel (OR3) writes to the same reg- 
ister, then STI accepts as input the contents of the register before it is modi- 
fied by the ORS. 


If src2 and dst2 point to the same location, src2 is read before the write to 
dsi2. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF 0. 
N  MSB of the output. 
Z  1ifazero result is generated, 0 otherwise. 
VO. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-183 


Example 


11-184 


OR3 *++AR2,R5,R2 
ll STIR6,*AR1- - 


Before Instruction: 


AR2 = 80 9830h 

R5 = 80 0000h 

R2=0Oh 

R6 = ODCh = 220 

AR1 = 80 9883h 

Data at 80 9831h = 9800h 

Data at 80 9883h = Oh 

LUF LV UF NZV CzH#0 000 0 0 0 


After Instruction: 


AR2 = 80 9831h 

R5 = 80 0000h 

R2 = 80 9800h 

R6 = ODCh = 220 

AR1 = 80 9882h 

Data at 80 9831h = 9800h 

Data at 80 9883h = ODCh = 220 

LUF LV UF NZVC=0 000000 


Assembly Language Instructions 


POP Integer POP 


Seebsesaneieiaasiehainsseceeeeeseiseeeten sees esaeaasenaseiesseseeeacaasoeiesaeeaneaaaseeseetesenaeaeesesessanaeate est esaeatatentateninenetetatetenetaetetatenetottvteteentosssemtsenetveentotanseaestatonetsetetetasneosatatanenevtstatateesteletatbtetaratotvedaseaManaeaeedestonetetsteassbassivan ove eeaseaentuestel ata somenanenssteleseStusptusetes sesenonetesesnsonenesenusesSulOtatssHuseSiSiteDateletitlelietdfORitGtiNR ts tatetetanisieiree 


Syntax POP ast 
Operation *SP—~—-, dst 


Operands dst register (any register in CPU primary register file) 


Encoding 
31 


24 23 16 15 87 ) 
jooofo1110 0/01] at |oo00000000000000 


Description The top of the current system stack is popped and loaded into the dst regis- 
ter. The top of the stack is assumed to be a signed integer. The POP is per- 
formed with a post decrement of the stack pointer. 


Cycles 1 


Status Bits |f ST (SETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (GETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. | | 
N 1 if anegative result is generated, 0 otherwise. 
Z 1 ifazero result is generated, 0 otherwise. 
VO. 

C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example POP R3 
Before Instruction: 


SP = 80 9856h 

R3 = 012DAh = 4,826 

Data at 80 9856h = OFFFF ODA4h = — 62,044 

LUF LV UF NZV C=#=0 000 0 0 0 


After Instruction: 


SP = 80 9855h 

R3 = OFFFF ODA4h = -62,044 

Data at 80 9856h = OFFFF ODA4h = — 62,044 

LUF LV UF NZVC=#0 001000 


11-185 


POPF POP Floating-Point Value 


etatatetatetete’ erat 


Syntax POPF dst 
Operation *SP—-—-» dst1 


Operands dstregister (RO — R11) 


Encoding 
31 87 


24 23 16 15 0 
fooolor110 1/01] at |ooo000000 00 0000 0 


Description The top of the current system stack is popped and loaded into the dst regis- 
ter. The top of the stack is assumed to be a floating-point number. The POP 
is performed with a post decrement of the stack pointer. 


Cycles 1 
Status Bits LUF Unaffected. 
UF 0. 


LV Unaffected. | 

N 1 if anegative result is generated, 0 otherwise. 
Z 1 +ifazero result is generated, 0 otherwise. 
VO.z | 

C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example poPF R4 | | 


Before Instruction: | 


SP = 80 984Ah 

R4 = 02 5D2E 0123h = 6.91186578e + 00 

Data at 80 984Ah = 5F2C 1302h = 5.32544007e + 28 
LUF LV UF NZV C=0 0 00 0 0 0 


After Instruction: 


SP = 80 9849h 

R4 = 5F 2013 0200h = 5.32544007e + 28 

Data at 80 984Ah = 5F2C 1302h = 5.32544007e + 28 
LUF LV UF NZV C=0 0000 0 0 


41-186 Assembly Language Instructions 


__PUSH Integer PUSH 


Syntax PUSH src 


Operation src + *++SP 


Operands _ srcregister (any register in CPU primary register file) 


Encoding 
31 0 


| 24 23 16 15 87 


Description Thecontents of the srcregister are pushed on the current system stack. The 
| srcis assumed to be a signed integer. The PUSH is performed with a prein- 
crement of the stack pointer. 


Cycles | 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—sUnaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example PUSH R6 


Before Instruction: 


SP = 80 98AEh 

R6 = 815Bh = 33,115 

Data at 80 98AFh = Oh 

LUF LV UF N ZV C=0 0 00 0 0 0 


After Instruction: 
SP = 80 98AFh | 
R6 = 815Bh = 33,115 


Data at 80 98AFh = 815Bh = 33,115 
LUE LV UENZVC=+0 000000 


11-187 


PUSHF PUSH Floating-Point Value _ sa scent aecincee, ss 


Syntax 

Operation 
Operands 
eee 


PUSHF src 
src — *++SP 
src register (RO — R11) 


87 0 


Description The contents of the src register are pushed onto the current system stack. 


Cycles 
Status Bits 


Mode Bit 


Example 
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The srcis assumed to be a floating-point number. The PUSH is performed 
with a preincrement of the stack pointer. 


1 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—sUnaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 


PUSHF R2 
Before Instruction: 
SP = 80 980th 


R2 = 02 5C 12 8081h = 6.87725854e + 00 
Data at 80 9802h = Oh 
LUF LV UF N ZV C2#0 000000 


After Instruction: 


SP = 80 9802h 

R2 = 02 5C1 28081h = 6.87725854e + 00 

Data at 80 9802h = 025C 1280h = 6.87725830e + 00 
LUF LV UF NZV C=#0 000000 


Assembly Language Instructions 


Reciprocal of Floating-Point Value RCPF 


Syntax RCPF src, dst 
Operation 16-bit reciprocal of src > dst 


Operands src __ extended-precision register, direct and indirect addressing modes 
dst RO-Ri11 


Encoding 
31 24 23 16 15 87 0 


Instruction Word Fields 
Gy sre addressing modes 


extended-precision register 

(RO — R11) 
for | arectmode 
[10 [indirectmode 


Description The 16-bit approximation of the reciprocal of the src operand is loaded into 
: the dstregister. The dst and src operands are assumed to be floating-point 
numbers. 


Cycles 1 


Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 
N 1 if a negative result is generated, 0 otherwise. 
Z  1isazero result, 0 otherwise. 
V1 if a floating-point overflow occurs, 0 otherwise. 
C Unaffected. 


Mode Bit —OVM Operation is not affected by OVM bit value. 


11-189 


Syntax RETicond 


Operation _\f (cond is true) 
*(SP) — PC 
ST(PGIE) — ST(GIE) 
ST(PCF) — ST(CF) 
Else, continue 


Operands None 


Encoding 
31 24 23 16 15 87 


0 
011 11.000000| cm |o000000000000000 


Description \f the condition is true, then the top of the stack is popped to the PC, PGIE 
is copied to GIE, and PCF is copied to CF. If the condition is not true, then 
continue normal operation (see Section 11.2 on page 11-10 for a list of con- 
dition mnemonics, encoding, and flags). | 


Cycles 4 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
Cc - Unaffected. 


‘Mode Bit OVM Operation is not affected by OVM bit value. 


11-1 90 3 ; Assembly Language Instructions 


‘ 


_ Return From Interrupt or Trap Conditionally Delayed 


_RETIcondD 


‘Syntax RETIconaD 


Operation lf (cond is true) 
*(SP) > PC 
ST(PGIE) > ST(GIE) 
ST(PCF) — ST(CF) 
Else, continue 
Operands None 


Encoding | 
31 24 23 16 15 87 


| 0 
011 11.000001| cmd |0000000000000000 


Description Performs a delayed return from an interrupt or trap. 


Since this is a delayed return, the three instructions following the 
RETIconaD are fetched and executed. These three instructions may nei- 
ther modify the program flow nor load the status register (see Section 11.2 
on page 11-10 for a list of condition mnemonics, encoding, and flags). 


Interrupts are disabled for the duration of the RETIconaD. 
Cycles 1 | 


Status Bits LUF Unaffected. 
LV Unaffected. 

UF Unaffected. 

N Unaffected. 

Z Unaffected. 

V Unaffected. 

C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-191 


RETScond Return from Subroutine Conditionally 


Syntax 


Operation 


Operands 


Encoding 


RETScond 


lf cond is true: 
*SP—— — PC. 
Else, continue. 


None 


31 2423 16 15 87 0 


Description A conditional return is performed. If the condition is true, the top of the stack 


Cycles 
Status Bits 


Mode Bit 


Example 


11-192 


is popped to the PC. 


The TMS320C40 provides 20 condition codes that can be used with this in- 
struction (see Section 11.2 on page 11-10 for a list of condition mnemonics, 


encoding, and flags). 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 
RETSGE 


Before Instruction: 


PC = 123h 

SP = 80 983Ch 

Data at 80 983Ch = 456h 

LUF LV UF NZVC=0 0000 00 


r Instruction: 
PC = 456h 
SP = 80 983Bh 


Data at 80 983Ch = 456h | 
LUF LV UF NZVC=0 0000 0 0 


Assembly Language Instructions 


Nn aes car SC SSC RS SS aca aL an SS te DE SNS SS boca SPENDS TSTS TANTS NEE Beale ea SONS tee tS 


Syntax RND src, dst 


Operation _rnd(src) > dst 


Operands _ src general addressing modes (G): 
00 _ register (RO — R11) 
01 direct 
10 _ indirect 
11 immediate 


dst register (RO — R11) 


Encoding 
15 87 | 0 


31 24 23 16 


Description The result of rounding the src operand is loaded into the dstregister. The src 
operand is rounded to the nearest single-precision floating-point value. If 
the src operand is exactly halfway between two single-precision values, it 
is rounded to the most positive of those values. 


Cycles 1 


Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV (1 if a floating-point overflow occurs, unchanged otherwise. 
UF (1 if a floating-point underflow occurs, 0 otherwise. 
N 1 if a negative result is generated, 0 otherwise. 
Z 1 éifazero result is generated, 0 otherwise. 
V1 if a floating-point overflow occurs, 0 otherwise. 
C Unaffected. 


Mode Bit OVM Operation is affected by OVM bit value. 
Example RND R5,R2 


Before Instruction: 

R5 = 07 33C1 6EEFh = 1.79755599e + 02 
R2=O0Oh | 

LUF LV UF NZ2VC=#=0 000000 


After Instruction: 


R5 = 07 33C1 6EEFh = 1.79755599e + 02 
R2 = 07 33C1 6FO0h = 1.79755600e + 02 
LUF LV UF NZV C=#=0 000000 


11-193 


ROL Rotate Left 


Syntax ROL dst 
Operation dst left-rotated 1 bit > dst 


Operands dst register (any register in CPU primary register file) 
Encoding 


31 24 23 16 15 87 0 
foool100011/111 at |0000000000000000 


Description The contents of the dst operand are left-rotated one bit and loaded into the 
dst register. This is a circular rotate with the MSB transferred into the LSB. 


Rotate left: 
jf 
Cycles 1 


Status Bits |f ST (SETCOND) =0 and the destination register is RO — R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 
LV Unaffected. 


UF 0. 

N MSB of the output. 

Z 1 if azero output is generated, 0 otherwise. 

VO. 

C Set to the value of the bit rotated out of the high-order bit. Unaffected 


if dst is not RO — R7. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example = ROL, R3 | | 
Before Instruction: 


R3 = 8002 5CD4h 
LUF LV UF N ZV C=0 000000 


After Instruction: 


R3 = 0004 B9A9h 
LUF LV UF NZV C=#0 000001 


11-194 Assembly Language Instructions 


Syntax 

Operation 
Operands 
Encoding 


31 24 23 16 15 87 


Rotate Left t Through Carry ROLC 


CERES RE EER AIR RPP PIR RR en naa agen a tage ala aN Ds patcacacecataracscaiaceie AP aAUP IAP TAL RAGAN cn alm aiaaReaetetts NT OO POCO OER PARAL RPLENLL SAE SEMRRP ONS EAP PEM LENAE SNOOP? ONSERSANEDS NAROMARANED ARERR A 


ROLC ast 
dst left-rotated 1 bit through carry bit > dst 


dst register (any register in CPU primary register file) 


0 


Description The contents of the dst operand are left-rotated one bit through the carry bit 


Cycles 
Status Bits 


Mode Bit 


and loaded into the dst register. The MSB is rotated to the carry bit, at the 
same time the carry bit is transferred to the LSB. 


Rotate left through carry bit: 
: dst 


1 


lf ST (SETCOND) =0 and the destination register is RO— R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 


UF 0. 

N  MSB of the output. 

Z 1ifazero output is generated, 0 otherwise. 

VO. 

C Set to the value of the bit rotated out of the high-order bit. If dstis not 


RO — R7, then C is shifted into the dst but not changed. 
OVM Operation is not affected by OVM bit value. 


11-195 


Example 


Example 
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ROLC R3 
Before Instruction: 


R3 = 0000 0420h 
LUF LV UF N Z V C=0 


After Instruction: 


R3 = 00000 0841h 
LUF LV UF N Z V C=0 


ROLC R3 


Before Instruction: 


R3 = 8000 4281h 
LUF LV UF N Z V C=0 


After Instruction: 
R3 = 0000 8502h 


LUF LV UF N ZV C=0 


Assembly Language Instructions 


Rotate Right ROR 


Syntax ROR ast 
Operation dst right-rotated 1 bit through carry bit — dst 
Operands dst register (any register in CPU primary register file) 


Encoding 
31 


24 23 16 15 87 0 


Description The contents of the dstoperand are right-rotated one bit and loaded into the 
dst register. The LSB is rotated into the carry bit and also transferred into 
the MSB. 


Rotate right: 


Cycles —_ ‘1 


Status Bits _\f ST (SETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 
LV Unaffected. 


UF 0. 

N MSB of the output. 

Z 1 ifazero output is generated, 0 otherwise. 

VO. 

C Set to the value of the bit rotated out of the high-order bit. Unaffected 


if dst is not RO — R7. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example ROR R7 


- Before Instruction: 


R7 = 00000421h | 
LUF LV UF NZ V C=0 000000 


After Instruction: 


R7 = 80000210h © 
LUF LV UF NZVC=0 001001 
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RORC : Rotate Right Through Carry | | 


Syntax RORC dst | 
Operation dst right-rotated 1 bit through carry bit > dst 
Operands dst register (any register in CPU primary register file) 


aaa 
87 0 


Py REA i aU 11794147 74 4% «971 


Description The contents of the dst operand are right-rotated one bit through the status 
register’ S Carry bit. This could be viewed as a 33-bit shift. The carry bit value 
is rotated into the MSB of the dst; at the same time, the dst LSB is rotated 
into the carry bit. 


Rotate right through carry bit: 


| 


Cycles 1 


Sat 1 


Status Bits |f ST (SETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 
LV Unaffected. 


UF 0. 

N MSB of the output. 

Z 1/ifazero output is generated, 0 otherwise. 

VO. 

C Set to the value of the bit rotated out of the high-order bit. If dstis not 


RO — R7, then C is shifted in but not changed. 
Mode Bit OVM Operation is not affected by OVM bit value. 
Example = RORC R4 
Before Instruction: 


R4 = 8000 0081h 
LUF LV UFNZVC=0 001000 


After Instruction: 


R4 = 4000 0040h 
LUF LV UF NZVC=#0 000001 


11-198 | Assembly Language Instructions 


Repeat Block RPTB 


sp*n'atehatela'etatatatetatecntetataete ada n at dats atateaegiey ae anes se Aaa a Rae ene ap ae ania neta ateate Anas aa alee statat attests sta’ 


Syntax RPTB src 


Operation §src+PC+1—- RE 


1 — ST (RM) 
Next PC — RS 
Operands src 24-bit signed immediate displacement or register mode 
Encoding 
For 24-bit signed immediate or register mode: 
31 24 23 1615 87 0 
01100100 sre (displacement) 
For register mode: 
31 24 23 16 15 87 0 


011110010/000000 00000000000 0 src 


Description RPTB allows a block of instructions to be repeated a number of times with- 
out any penalty for looping. 


It activates the block repeat mode of updating the PC. The srcoperand may 
be a 32-bit register value or a 24-bit signed immediate value (displacement). 
The resulting src address is the end address of the block to be repeated. 
This address is loaded into the repeat end address (RE) register. A 1 is writ- 
ten into the repeat mode bit of status register (ST(RM)) to indicate that the 
PC is to be updated in the repeat mode. The address of the next instruction 
is loaded into the repeat start address (RS) register. 


Cycles 4 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-199 


Syntax 


Operation 


Operands 
Encoding 


RPTBD src 
if src is an immediate value (displacement) 
sr¢c+PC +3— RE 
Else: 
src > RE 
1 — ST(RM) 
PC of RPTBD + 4-5 RS 
Src 24-bit signed immediate displacement or register mode 


For 24-bit signed immediate or register mode: 


31 


rer re . src (displacement) 


For register mode: 
31. 


Oo1441 


Description 


Cycles 
Status Bits 


Mode Bit 


11-200 


24 23 16 15 87 


iO 


24 23 1615 87 


© 


00117;00000000000000000 0 src 


RPTBD allows a block of instructions to be repeated a number of times with- 
out any penalty for looping and with single-cycle execution of the RPTBD 
instruction. | 


lt activates the block repeat mode of updating the PC. The srcoperand may 
be a 32-bit register value or a 24-bit signed immediate value (displacement). 
The resulting src address is loaded into the repeat end address (RE) regis- 
ter (block end address). A 1 is written to the status-register repeat mode 
bit (ST(RM)), indicating the PC is to be updated in the repeat mode. The ad- 
dress of the next instruction +3 is loaded into the repeat start address (RS) 


register. 


RPTBD does not flush the pipeline. The three instructions following RPTBD 
are executed and may not be an instruction that modifies the program flow. 
These three instructions are not part of the block that is repeated. 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Zz Unaffected. 
V Unaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


Syntax RPTS src 


Operation src¢-—»RC 
1 — ST (RM) 
19S 
Next PC — RS 
Next PC — RE 


Operands src general addressing modes (G): 
00 _ register 
01 direct 
10 _ indirect 
11 immediate 


Encoding | 
31 15 87 0 


24 23 16 


Description The RPTS instruction allows a single instruction to be repeated a number 
of times without any penalty for looping. Fetches can also be made from the 
instruction register (IR), thus avoiding repeated memory access. 


The src operand is loaded into the repeat counter (RC). A 1 is written into 
the repeat mode bit of the status register ST (RM). A 1 is also written into 
the repeat single bit (S). This indicates that the program fetches are to be 
performed only from the instruction register. The next PC is loaded into the 
repeat end address (RE) register and the repeat start address (RS) register. 


For the immediate mode, the src operand is assumed to be an unsigned in- 
teger and is not sign-extended. 


Cycles 4 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-201 


Example 


11-202 


iseddeesssanensiesianetanrmnntcetedmnnetatcaminenineannnanncengnedencenntmattentednnitanednednnedNedetntcnncunt dete tueicln teddnntunetedlncettenueetnnncancettE 


RPTS AR5 


Before Instruction: 


PC = 123h 

ST = 0h 

RS = 0h 

RE = Oh 

RC = 0h 

AR5 = OFFh 

LUF LV UF NZVC=0 0 00 0 0 0 


r Instruction: 


PC = 124h 
ST = 100h 
RS = 124h 
RE = 124h 
RC = OFFh 
AR5 = OFFh 
LUF LV UF NZ V C=0 0 00 0 0 0 


Assembly Language Instructions 


e Root Floating-Point Value RSQRF 


Syntax RSQRF src, dst 
Operation 16-bit reciprocal of the square root of src > dst 


Operands src extended-precision register, direct and indirect addressing modes 
dst extended-precision register 


Encoding 
31 24 23 16 15 87 


Instruction Word Fields 
|G | srcaddressing modes 


extended-precision register 
(RO — R11) 


Description The 16-bit approximation of the reciprocal of the square root of the number 
atthe srcoperand Is loaded into the dstregister. The number at the srcop- 
erand is assumed to be positive. The operation for negative inputs is unde- 
fined. | 


The value atthe dstand srcoperands are assumed to be floating-point num- 
bers. 


Cycles | 


Status Bits LUF Unchanged. 
LV 1 if input is zero unchanged otherwise. 


UF 0. 
N 0. 
Z O 


V1 if input is zero, 0 otherwise. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-203 


SIGI Signal, interlocked 


Syntax SIGI src, dst 


Operation LOCK (or LLOCK) pin brought low. 
src — dst | 
LOCK (or LLOCK) pin brought high. 


Operands src_ direct and indirect addressing modes (assumed to be signed integer) 
dst _ register mode (assumed to be signed integer) 


Encoding 
24 23 16 15 


31 87 0 


Instruction Word Fields 


|G | sreaddressing modes 


Description An interlocking operation is signaled using the appropriate bus-lock signal 
(LOCK or LLOCK) if and only if an external memory access is performed. 
The src and dst operands are assumed to be signed integers. After the read 
is performed, the bus-lock signal is deasserted. If an internal memory ac- 
cess is performed, SIGI will perform the read but will not assert a bus-lock 
signal. 


The numbers at the src and dst operands are treated as signed integers. 
Cycles | 


Status Bits _\f ST (SETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 
Z 1 ifazero result is generated, 0 otherwise. 
VO. 

C Unaffected. 


in Mode Bit |OVM Operation is not affected by OVM bit value. 


11-204 _ Assembly Language Instructions 


Store Floating-Point Value STF 


Syntax 
Operation 


Operands 


Encoding 


STF src, dst 
src — dst 
src register (RO — R11) 


dst general addressing modes (G): 
01 direct 
10 indirect 


87 0 


3 24 23 1615. 
ivedel a pe 


Description The srcregister is loaded into the dstmemory location. The srcand dstoper- 


Cycles 
Status Bits 


Mode Bit 


Example 


ands are assumed to be floating-point numbers. 
1 


LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
Cc Unaffected. 


OVM Operation is not affected by OVM bit value. 
STF R2,@98A1h | 
Before Instruction: 

DP = 80h 

R2 = 052 C501 900h = 4.30782204e + 01 


Data at 80 98A1h = Oh 
LUF LV UF NZVC2=0 0000 00 


After In: i 


DP = 80h 

R2 = 05 2C50 1900h = 4.30782204e + 01 

Data at 80 98A1h = 52C 5019h = 4.30782204e + 01 
LUF LV UF NZVC2=+0 0000 00 


11-205 


STF Store Floating-Point Value, Interlocked | 


Syntax STFI src, dst 


Operation src — dst 
Signal end of interlocked operation. 


Operands _ srcregister (RO — R11) 


dst general addressing modes (G): 
01 direct 
10 indirect 


oe 
87 | 0 


Description The srcregister is loaded into the dstmemory location. An interlocked oper- 
ation is signaled over LOCK or LLOCK. The src and dst operands are as- 
sumed to be floating-point numbers. Refer to Section 7.7 on page 7-39 for 
detailed information. 


Cycles | 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


Example STFI R3,*-AR4 


Before Instruction: 


R3 = 07 33C0 0000h = 1.79750e + 02 

AR4 = 80 993Ch 

Data at 80 993Bh = Oh 

LUF LV UF NZVC=0 000000 


R3 = 07 33C0 0000h = 1.79750e + 02 

AR4 = 80 993Ch 

Data at 80 993Bh = 733 C000h = 1.79750e + 02 
LUF LV UF NZV C=0 000000 


11-206 | | Assembly Language Instructions 


Parallel Store Floating-Point Value STF ISTE 


Syntax STF src2, dst2 
|| STF src, dst? 
Operation src2 + dst2 
\| srct — dst? 


Operands srci_ register (Rn1,0<n1 <7) 
dst! indirect (disp = 0, 1, IRO, IR1) 
src2 register (Rn2, 0 < n2 < 7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


Encoding 


3 15 87 0 


| 24 23 16 


Description Two STF instructions are executed in parallel. Both src? and src2 are as- 
sumed to be floating-point numbers. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-207 


STF [STF Parallel Store Floating-Point Value 


Example STFR4, *AR3- - 


il STF R3, *++AR5 

Before Instruction: 

R4 = 07 0C80 0000h = 1.4050e + 02 

AR3 = 80 9835h 

R3 = 07 33C0 0000h = 1.79750e + 02 

AR5 = 80 99D2h 

Data at 80 9835h = Oh 

Data at 80 99D3h = Oh 

LUF LV UF NZVC#0 000000 


r Instruction: 


R4 = 07 0C80 0000h = 1.4050e +02 

AR3 = 80 9834h 

R3 = 07 33C0 0000h = 1.79750e + 02 

AR5 = 80 99D3h 

Data at 80 9835h = 070C 8000h = 1.4050e + 02 
Data at 80 99D3h = 0733 C000h = 1.79750e + 02 
LUF LV UF NZVC=#=0 000000 


11-208 Assembly Language Instructions 


SO SE LS SS SC LA ORS a aR NC Sd 


Syntax STI src, dst 
Operation src — dst 
Operands _ srcregister (any register in CPU primary register file) 


dst general addressing modes (G): 
01 direct 
10 — indirect 


Encoding 
31 15 


24 23 16 8 7 0 


Description The srcregister is loaded into the dstmemory location. The srcand dstoper- 
ands are assumed to be signed integers. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example STI R4,@982Bh 


Before Instruction: 


DP = 80h 

R4 = 42BD7h = 273,367 | 

Data at 80 982Bh = OE5FCh = 58,876 

LUF LV UF NZVC=0 0000 0 0 


r_ Instruction: 
DP = 80h 
R4 = 42 BD7h = 273,367 


Data at 80 982Bh = 42BD7h = 273,367 
LUF LV UF NZVC=0 0 00 0 0 0 


11-209 


STII Store Integer, interlocked 


Peaenee eee teeeaeenenetsaeesieYetanet senate eaten NEETU Ne Teme we LeneneaeeteneeetaneeeNtatanesaeanetstMeeenensatemasnetatasatostatecapsenseatonetasasototetetaeatatetcssatetoteeatoteesanenesanetteQeceteetatonsUsaehttatsAONes state hUQatORe UUMGMGtenesQctenetstQatategtatetetetenstetGtsGserStNsaBetGtGQnsterGtSGnctenStTaeatGtGHotbnessGeanaaueateltseheeGeuteebescasnoecessseeeuatanatesgcecaeehe eceeaneGaeses 


Syntax STII src, dst 


Operation src — dst 
Signal end of interlocked operation. 


Operands __ srcregister (any register in CPU primary register file) 


dst general addressing modes (G): 
01 direct 
10 indirect 


Encoding 3 
31 15 87 0 


24 23 | 16 


Description The srcregister is loaded into the dstmemory location. An interlocked oper- 
ation is signaled over LOCK or LLOCK. The src and dst operands are as- 
sumed to be signed integers. Refer to Section 7.7 on page 7-39 for detailed 
information. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V __~sUnaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
Example STII R1,@98AEh 
Before Instruction: 


DP = 80h 
R1 = 78Dh 
Data at 80 98AEh = 25Ch 


After Instruction: 


DP = 80h 
R1 = 78Dh 
Data at 80 98AEh = 7BDh 


11-210 | Assembly Language Instructions 


sssesseneeseessesarasessezusueiggstineeieeseueusiseseusessecesesterescsnosazseseuenasessaeaeeesenetesseseneeeeeteteesetanesseetateetatenatcegesaeatetoteestetettatateteetatatateetenstotatoteranoneesstanetraseneteetstenetaetatetotoetatetecteteteetateneetanetectatettatetettatatetetetanetetetetateteveetetters PalahitaPatataliteletelatetitetalatatetetetetehefetelatetatatatasees 


Syntax STI src2, dst2 
|| STI src7, dst? 


Operation src2 > dst2 
[| srct — dsti 


Operands src1_ register (RO — R7) 
dst? indirect (disp = 0, 1, IRO, IR1) 
src2__ register (RO — R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


Encoding 
3 15 87 0 


1 24 23 16 


Description Two integer stores are performed in parallel. If both stores are executed to 
the same address, the value written is that of STI src2, dsi2. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V__—Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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STI isTI Parallel STI and STI 


Example 


11-212 


NTI aS 


STI RO, *++AR2 (IRO) 
|| STI R5, *ARO 


Before Instruction: 


RO = ODCh = 220 

AR2 = 80 9830h 

IRO = 8h 

R5 = 35h = 53 

ARO = 80 98D3h 

Data at 80 9838h = Oh 

Data at 80 98D3h = Oh 

LUF LV UF N ZV C=#0 0000 0 0 


After Instruction: 

RO = ODCh = 220 

AR2 = 80 9838h 

IRO = 8h 

R5 = 35h = 53 

ARO = 80 98D3h 

Data at 80 9838h = ODCh = 220 


Data at 80 98D3h = 35h = 53 
LUF lV UF NZV C=0 000 0 0 0 


Assembly Language Instructions 


Store Integer Immediate Value STIK 


SO aeece ease eheaeieeeeeeaea see aa ten ee eee eet NNN ENE NOt Te Snot MINES RNAS EOLONUSUENEOA RON SHGREOUUGHLESESISUAGRLACACOCAGRELLHGHGA A HGRA SA LEGAALUCHSPGSCOGRLACAUHUALEDU HGRA AH DRDAL AC UAGHGRAOUASPGAAC NOGA Lele cAGMGRE SASAUAASAUAL OL NMG URuGHOMAGMNUN GPP ot aMGt BPN GAIROIUNBGRIIPUSPOAUAG ASE SUBS ACNAUPUAIP NAGPUR AGAR ALP A LRCRRLACRLALR ACA RAL Aa ONESIES 


Syntax —- STIK _ src, ast 

Operation src—> dst 

Operands src __ 5-bit signed integer 
dst __ direct and indirect mode 


Encoding 
31 24 23 16 15 87 


Instruction Word Fields 


|G | dstaddressing modes 
roo | arectmode 


Description The 5-bit signed integer src value is loaded into the dst memory location. 
The src and dst operands are assumed to be signed integers. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


Mode Bit |©OVM Operation is not affected by OVM bit value. 
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SUBB Subtract Integer With Borrow | | 


Syntax 


Operation 
Operands 


Encoding 
‘ 


3 24 23 16 15 


SUBB sre, dst 
dst-—src—C — dst 


src general addressing modes (G): 
00 register (any register in CPU primary register file) 
01 direct 
10 indirect 
11 immediate 


dst register (any register in CPU primary register file) 


87 


Description The difference of the dst, src, and C operands, as calculated above, is 


Cycles 
Status Bits 


Mode Bit 
Example 
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loaded into the dst register. The dst and src operands are assumed to be 
signed integers. 


1 


lf ST (SETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF . 

N 1 if anegative result is generated, 0 otherwise. 

Z 1 ifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C 1 if a borrow occurs, 0 otherwise. 


OVM Operation is affected by OVM bit value. 
SUBB *AR5++(4),R5 


Before Instruction: 


AR5 = 80 9800h 

R5 = OFAh = 250 

Data at 80 9800h = 0C7h = 199 

LUF LV UF NZV C=0 0 000 0 1 


r Instruction: 
ARS = 80 9804h 
R5 = 032h = 50 


Data at 80 9800h = 0C7h = 199 
LUF LV UF N ZV C=0 000000 


Assembly Language Instructions 


Syntax = SUBB3 src2, src!, dst 


Operation src! —src2-—C > ast 


Operands srci,src2 both type 1 or type 2 three-operand addressing modes 


dst register mode (any register in CPU primary register file) 
Encoding 
Type 1 
31 24 23 16 15 87 | 0 
Type 2 
31 24 23 16 15 87 | 0 


a | agit roe any CPU regio) [ger made any CPU rege) 
indirect mode (disp = 0, 1, IRO, IR1) 


src addressing modes src2 addressing modes 
register mode (any CPU register) 8-bit signed immediate 


; - indirect mode “+ARn(5-bit unsigned 
register mode (any CPU register) displacement) 
indirect mode *+ARn(5-bit unsigned 2 , | : 
displacement) 8-bit signed immediate 
14 indirect mode *+ARn1(5-bit unsigned indirect mode *+ARn2(5-bit unsigned 
displacement) displacement) 
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SUBBS Subtract Integer With Borrow, 3 Operands 


eee messiness ncaa ence anaceaN saa masuauneunsesecetuauaeseeseedebecaccateaccaseesaeaedeblepasedetecenaecaeteacasleelecealeessedesaeceeseecaadaseeseeloaceateleseeseaseeceecaace eae a caeelleitataes a ataet abate aetecabreicenas 


a The difference of the src7 and src2 operands and the C (carry) flag is loaded 
into the dst register. The src1, src2, and dst operands are assumed to be 
signed integers. 


Cycles | 


Status Bits \f ST (SETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 
LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 
N 1 if anegative result is generated, 0 otherwise. 
Z  1éifazero result is generated, 0 otherwise. 
V___1if an integer overflow occurs, 0 otherwise. 
-C 1 if a borrow is generated, 0 otherwise. 


Mode Bit OVM Operation is affected by OVM bit value. 


11-216 | | Assembly Language Instructions 


Subtract Integer Conditionally _SUBC 


Syntax SUBC sre, dst 


Operation if (dst— src2 0): 
(dst-— sr¢ << 1)OR 1 —- dst 
Else: 
dst << 1 > dst 


Operands src general addressing modes (G): 
00 register (any register in CPU primary register file) 
O01 direct 
10 _ indirect 
11 immediate 


dst register (any register in CPU primary register file) 
Encoding 


31 24 23 16 15 —87 
oer al a ; 


Description The src operand is subtracted from the dst operand. The dst operand is 
loaded with a value that depends upon the result of the subtraction. If (dst 
— Src) is greater than or equal to zero, then (dst-— src) is left-shifted one bit, 
the least-significant bit is set to 1, and the result is loaded into the dst regis- 
ter. If (dst — src) is less than zero, dst is left-shifted one bit and loaded into 


the dstregister. The dstand srcoperands are assumed to be unsigned. .inte- 
gers. 


SUBC may be used to perform a single step of a multi-bit integer division. 
See subsection 12.3.4 for a detailed description. 


Cycles 1 


Status Bits LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z Unaffected. 
V Unaffected. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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Example SUBC @98C5h,R1 


Before Instruction: 

DP = 80h 

R1 = 04F6h = 1270 

Data at 80 98C5h = 492h = 1170 

LUF LV UF NZVCz=#0 000000 


After Instruction: 
DP = 80h 
R1 =0C9h = 201 


Data at 80 98C5h = 492h = 1170 
LUF LV UF NZ VCz#0 0000 00 


Example SUBC 3000,RO0 (3000 = OBB8h) 
Before Instruction: 


RO = 07D0h = 2000 
LUF LV UF NZV C=0 0 00 0 0 0 


After Instruction: 


‘RO = OFAOh = 4000 
LUF LV UF NZV C=0 000000 


11-218 Assembly Language Instructions 


Syntax 
Operation 


Operands 


Encoding 
31 


24 23 16 0 


SUBF src, dst 
dst — src > dst 


src general addressing modes (G): 
register (RO — R11) 
01 direct 
10 indirect 
11 = immediate 


dst register (RO — R11) 


15 87 


Description The result of the dst operand minus the src operand is loaded into the 


Cycles 
Status Bits 


Mode Bit 
Example 


dst register. The dst and src operands are assumed to be floating-point 
numbers. 


1 


LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if an floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if anegative result is generated, 0 otherwise. 

Z  1éifazero result is generated, 0 otherwise. 

V____1 if an floating-point overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
SUBF *ARO--(IRO),R5 
Before Instruction: 


ARO = 80 9888h 

IRO = 80h 

R5 = 07 33C0 0000h = 1.79750000e + 02 

Data at 80 9888h = 70C 8000h = 1.4050e + 02 
LUF LV UF NZV C=0 000000 


After Instruction: 


ARO = 80 9808h 
IRO=80h 

R5 = 05 1D00 0000h = 3.9250e + 01 | 

Data at 80 9888h = 70C 8000h = 1.4050e + 02 
LUF LV UF NZV C=0 0000 0 0 
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Syntax SUBFS src2, src1, dst 


Operation  src1 —src2 > dst 


Operands  srci,src2 both type 1 or type 2 three-operand addressing modes 


dst register mode (RO — R11) 
Encoding 
Type 1 : | | 
31 24 23 16 15 87 0 
Type 2 
31 2423 16 15 87 0 


| 00 | register mode (RO — R11) 


| srci addressing modes src2 addressing modes 


register mode (any CPU register) aa taal +ARn(S-bit unsigned 
44 indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) displacement) 


11-220 : Assembly Language Instructions 


Subtract Floating-Point Value, 3 Operands SUBF3 


sosaeseeseecoosnssiaesn seca einen ac npn ea nan aa aaa naa na aaa alana sssebeessiet sain oer eee ete secant 


Description The difference of the src? and src2 operands is loaded into the dst register. 
The src1, src2, and dst operands are assumed to be floating-point numbers. 


Cycles 1 


Status Bits LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if an floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 
N 1 if a negative result is generated, 0 otherwise. 
Z 1 éifazero result is generated, 0 otherwise. 
V1 if an floating-point overflow occurs, 0 otherwise. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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| SUBF3| |STF Parallel SUBF3 and STF 


Syntax 


Operation 


Operands 


Encoding 


SUBF3 src, src2, dst1 
|| STF src3, dst2 


src2— src1l — dst! 
| src3— dst2 


src1__ register (RO — R7) 


src2__ indirect (disp = 0, 1, IRO, IR1) 
dst! register (RO — R7) 
src3_register (RO — R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


87 0 


31 24 23 16 15 
BEE eT es de 


Description Afloating-point subtraction and a floating-point store are performed in paral- 


Cycles 
Status Bits 


Mode Bit 
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lel. All registers are read at the beginning and loaded at the end of the ex- 
ecute cycle. This means that if one of the parallel operations (STF) reads 
from aregister and the operation being performed in parallel (SUBF3) writes 
to the same register, then STF accepts as input the contents of the register 
before it is modified by the SUBF3. 


If src3 and dst7 point to the same location, src3 is read before the write to 
dst. 


1 


LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if an floating-point overflow occurs, unchanged otherwise. 
UF (1 if a floating-point underflow occurs, 0 otherwise. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 ifazero result is generated, 0 otherwise. 

V1 if an floating-point overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions 


Parallel SUBF3 and STF SUBF3||STF 


sopanecnateanenseeenerienes esecaienteennteeaies Se Se CT TO on SO ee SR a te asd 


ne — 4 etena ate e tai = 1 atets ata teate ass aerate as Cela aaa eae ee we _ ™ 


Example SUBF3 Rl, *-AR4 (IR1) ,RO0 
|| STF R7,*+AR5 (IRO) 
R1 = 05 7B40 0000h = 6.28125e + 01 
AR4 = 80 98B8h 
IR1 = 8h 
RO = 0h 
R7 = 07 33C0 0000h = 1.79750e + 02 
AR5 = 80 9850h 
IRO = 10h 
Data at 80 98BOh = 70C 8000h = 1.4050e + 02 
Data at 80 9860h = 0h 
LUF LV UF NZV Cz#0 000000 


After Instruction: 


R1 = 05 7B40 0000h = 6.28125e + 01 
AR4 = 8098B8h 

IR1 =8h | 

RO = 06 1B60 0000h = 7.768750e + 01 

R7 = 07 33C0 0000h = 1.79750e + 02 

AR5 = 80 9850h 

IRO = 10h 

Data at 80 98B0h = 70C 8000h = 1.4050e + 02 
Data at 80 9860h = 733 C000h = 1.79750e + 02 
LUF LV UFNZVC=0 000000 
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SUBI Subtract Integer 


Syntax 


Operation 


Operands 


Encoding 
31 | 


24 23 16 
veda] a 


SUBI src, dst 
dst— src —> dst 


src general addressing modes (G): 
00 register (any register in CPU primary register file) 
01 direct 
10 indirect 
11 immediate 


dst register (any register in CPU primary register file) 


15 87 


0 


Description The difference of the dst operand minus the src operand is loaded into the 


Cycles 
Status Bits 


Mode Bit 


Example 


11-224 


dst register. The dst and src operands are assumed to be signed integers. 
1 


lf ST (SETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (GETCOND) = 1, they are modified for all destina- 
tion registers. | 
LUF Unaffected. | | 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. | 

N 1 if a negative result is generated, 0 otherwise. 

Z 1/ifazero result is generated, 0 otherwise. 

V___‘1 if an integer overflow occurs, 0 otherwise. 

C 1 if a borrow occurs, 0 otherwise 


OVM Operation is affected by OVM bit value. 
SUBI 220,R7 


Before Instruction: 


R7 = 226h = 550 
LUF LV UF NZV Cz=#0 00000 0 


After Instruction: 


R7 = 14Ah = 330 , 
LUF LV UF NZVC=0 000 0 0 0 


Assembly Language Instructions 


Sy Sos San as ee aS a aan 


Subtract Integer, 3 Operands 


ses Nn SS A AE Sar nar 


Syntax SUBIS src2, src1, dst 
Operation src1— src2 — dst 


Operands srci,src2 both type 1 or type 2 three-operand addressing modes 


dst register mode (any register in CPU primary register file) 
Encoding 
Type 1 
31 24 23 16 15 87 0 
Type 2 
31 _ 2423 1615 87 0 


Instruction Word Fields 


src1 addressing modes src2 addressing modes 
register mode (any CPU register) 
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Ptetaah 


SUBI3 Subtract Integer, 3 Operands — 


Description The result of the src? operand minus the src2 operand is loaded into the dst 


Cycles 
Status Bits 


Mode Bit 
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register. The src7, src2, and dst operands are assumed to be signed inte- 
gers. 


1 


lf ST (SETCOND) = 0 and the destination register is RO ~ R11, the condition 
flags are modified. If ST (GETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV 1 if aninteger overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if anegative result is generated, 0 otherwise. 

Z  1ifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C 1 if a borrow is generated, 0 otherwise. 


OVM Operation is affected by OVM bit value. 


Assembly Language Instructions 


Parallel SUBI3 and STI SUBIS3| [STI 


Syntax 


Operation 


Operands 


aoe 


SUBI3 src7, src2, dst? 
|| STI src3, dst2 


src2 — src! > dst1 
|| src3— dst2 


src1 register (RO — R7) 
src2__ indirect (disp = 0, 1, IRO, IR‘) 
dst? register (RO — R7) 
src3_ register (RO — R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


87 0 


Description Aninteger subtraction and an integer store are performed in parallel. All reg- 


Cycles 
Status Bits 


Mode Bit 


isters are read at the beginning and loaded at the end of the execute cycle. 
This means that if one of the parallel operations (STI) reads from a register 
and the operation being performed in parallel (SUBI3) writes to the same 


_register, then STI accepts as input the contents of the register before it is 


modified by the SUBIS. 


If src3 and dst? point to the same location, src3 is read before the write to 
dst1. 


1 


LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 

N 1 if anegative result is generated, 0 otherwise. 

2 1 éifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C 1 ifaborrow occurs, 0 otherwise. 


OVM Operation is affected by OVM bit value. 
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SUBI3||STI Parallel SUBI3 and ST 


Example 
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SUBI3 R7,*+AR2(IRO),R1 
|| STI R3,*++AR7 
Before Instruction: 
R7 = 14h = 20 
AR2 = 80 982Fh 
IRO = 10h 
R1=Oh 
R3 = 35h = 53 
AR7 = 80 983Bh 


Data at 80 983Fh = ODCh = 220 
Data at 80 983Ch = Oh 


LUF LV UF NZVC#0 000000 


After Instruction: 

R7 = 14h = 20 

AR2 = 80 982Fh 

IRO = 10h 

R1 =0C8h = 200 

R3 = 35h = 53 

AR7 = 80 983Ch 

Data at 80 983Fh = ODCh = 220 
Data at 80 983Ch = 35h = 53 


LUF LV UF NZVC=0 0000 0 0 


Assembly Language Instructions 


Subtract Reverse Integer With Borrow SUBRB 


Syntax 
Operation 


Operands 


Encoding 


31 24 23 16 15 
copiers fe, a : 


SUBRB src, dst 
src —dst-—C — dst 


src general addressing modes (G): 
00 _ register (any register in CPU primary register file) 
01 direct 
10 indirect 
11 immediate 


dst register (any register in CPU primary register file) 


87 0 


Description The difference of the src, dst, and C operands, as calculated above, is 


Cycles 
Status Bits 


Mode Bit 


Example 


loaded into the dst register. The dst and src operands are assumed to be 
signed integers. 


1 


lf ST (SETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 

UF 0. 

N 1 if a negative result is generated, 0 otherwise. 

Z 1 éifazero result is generated, 0 otherwise. 

V1 if an integer overflow occurs, 0 otherwise. 

C 1 if a borrow occurs, 0 otherwise. 


OVM Operation is affected by OVM bit value. 
SUBRB R4,R6__ 


Before Instruction: 


R4 = O3CBh = 971 
R6 = 0258h = 600 | 
LUF LV UF NZVC=0 0000 0 1 


After Instruction: 


R4 = 03CBh = 971 
R6 = 0172h = 370 
LUF LV UF N ZV C= 0 0000 0 0 
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oan 


«B81 


Syntax 
Operation 
Operands 


Encoding 


SUBRF Subtract Reverse Floating-Point Values | | 


24 23 : 16 0 
opie el a - 


SUBRF src, dst 
src — dst — dst 


src general addressing modes (G): 
00 register (RO — R11) 
01 direct 
10 — indirect 
11 immediate 


dst register (RO — R11) 


15 87 


Description The result of the src operand minus the dst operand is loaded into the dst 


Cycles 
Status Bits 


_ Mode Bit 


Example | 
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register. The dst and src operands are assumed to be floating-point num- 
bers. 


1 


LUF 1 if a floating-point underflow occurs, unchanged otherwise. 
LV 1 if a floating-point overflow occurs, unchanged otherwise. 
UF 1 if a floating-point underflow occurs, 0 otherwise. 

N — 1 if anegative result is generated, 0 otherwise. 

Z 1 éifazero result is generated, 0 otherwise. 

V1 if a floating-point overflow occurs, 0 otherwise. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
SUBRF @9905h,R5 
Before Instruction: 


DP = 80h 

R5 = 05 7B40 0000h = 6.281250 + 01 

Data at 80 9905h = 733 C000h = 1.79750e + 02 
LUF LV UFNZVC=0 000000 


After Instruction: 
DP = 80h | 
R5 = 06 69E0 0000h = 1.16937500e + 02 


Data at 80 9905h = 733 C000h = 1.79750e + 02 
LUF LV UF NZVC=0 000000 


Assembly Language Instructions 


SANNA! NNN OLDS: 


Subtract Reverse integer SUBRI 


Syntax SUBRI src, dst 


Operation src —dst > dst 


Operands _ srcgeneral addressing modes (G): 
00 register (any register in CPU primary register file) 
O01 direct 
10 indirect 
11 immediate 


dst register (any register in CPU primary register file) 
Encoding 
31 15 | _87 0 


, 24 23 16 
pootroorys} aw {oe 


Description The result of the src operand minus the dst operand is loaded into the dst 
register. The dst and src operands are assumed to be signed integers. 


Cycles 1 


Status Bits |f ST (SETCOND) =0 and the destination register is RO—R11, the condition 
flags are modified. If ST (GETCOND) = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV 1 if an integer overflow occurs, unchanged otherwise. 
UF 0. 7 

N 1 if anegative result is generated, 0 otherwise. 

Z 1 ifazero result is generated, 0 otherwise. 

V___1if an integer overflow occurs, 0 otherwise. 

C 1 if a borrow occurs, 0 otherwise. 


Mode Bit OVM Operation is affected by OVM bit value. 
Example SUBRI *AR5++(IRO),R3 


Before Instruction: 

AR5 = 80 9900h 

IRO = 8h 

R3 = ODCh = 220 | 

Data at 80 9900h = 226h = 550 

LUF LV UF N ZV Cz#0 000000 
After Instruction: 

AR5 = 80 9908h 

IRO = 8h 

R3 = 014Ah = 330 

Data at 80 9900h = 226h = 550 

LUF LV UF N ZV C=0 000000 


11-231 


Syntax 
Operation 
Operands 


Encoding 
31 


SWI 
Performs an emulation interrupt 


None 


24 23 


16 15 


87 


011001 11/0000/000000000000000000000 


Description The SWI instruction performs an emulator interrupt. This is a reserved in- 
struction and should not be used in normal programming. 


Cycles 


Status Bits LUF Unaffected. 
Unaffected. 
Unaffected. | 
Unaffected. 
Unaffected. | 
Unaffected. 
Unaffected. 


_ Mode Bit 


11-232 — 


4 


LV 
UF 
N 
Z 


 Yv 


C 


OVM Operation is not affected by OVM bit value. 


Assembly Language Instructions | 


Syntax TOIEEE src, dst 


Operation convert src to IEEE format > dst 


Operands src _ extended-precision register (RO — R11), 
direct and indirect addressing modes 
dst extended-precision register 


Encoding 
31 24 23 16 15 87 0 
| src 


Instruction Word Fields 

[ST _sreaddressing modes 
register mode (extended-preci- 

| Severn 


Description The srcoperand is converted from a twos-complement floating-point format 
to the IEEE floating-point format. 


The srcoperand is assumed to bea single-precision floating-point number. 
_ The converted result goes into the 32 MSBs of the dstregister. STF can be 
used to store the result to memory. | 


Cycles 1 


Status Bits LUF Unaffected. 

| LV 1 if an overflow occurs, unchanged otherwise. 
UF 0. 
N 1 if a negative result is generated, 0 otherwise. 
Z  1ifazero result is generated, 0 otherwise. 
V1 tf an overflow occurs, 0 otherwise. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-233 


TO 


Syntax TOIEEE src2, dst1 
| | STF src3, dst2 


Operation convert src2 to IEEE format > dst? 
in parallel with 
src3— dst2 


Operands src2__indirect mode (disp = 0, 1, IRO, IR1) 
| dst? register mode (Rn1, 0 <n1 <7) 
src3 register mode (Rn1, 0 <n1 < 7) 
dst2 indirect mode (disp = 0, 1, IRO, IR1) 
Encoding 
31 24 23 16 15 87 


0 
anny Bene hd ee” 


Description The src2 operand is converted from a twos-complement floating-point for- 
mat to the IEEE floating-point format. 


The src2 operand is assumed to be a single-precision floating-point number. 
The converted result goes into the 32 MSBs of the dst7 register. In parallel 
a floating-point store is done. 


lf src2 and dst2 point to the same location, then src2 is read before the write 
to dsi2. 


Cycles 1 


Status Bits LUF Unaffected. | 
LV 1 if an overflow occurs, unchanged otherwise. 
UF 0. 
N 1 if anegative result is generated, 0 otherwise. 
Z 1 /ifazero result is generated, 0 otherwise. 
V1 if an overflow occurs, 0 otherwise. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


11-234 | Assembly Language Instructions 


Syntax 
Operation 


Operands 


Encoding 
31 


Trap Condition 


TRAPcond N 


If (cond is true) 
ST(GIE) - ST(PGIE) 
ST(CF) > ST(PCF) 
0 — ST(GIE) 

1— ST(CF) 
next PC — *(++SP) 
trap vector N > PC 

Else, continue. 


N immediate mode (0 < N < 511) 


24 23 16 15 87 


0 


Description \f traps are to be nested, you may need to save the status register before 


Cycles 
Status Bits 


Mode Bit 


executing TRAP cond. If the condition is true, then GIE and CF are saved 
in PGIE and PCF in the status register, all interrups are disabled (0 — GIE), 
and the cache is frozen (1 — CF). Then, the contents of the PC are pushed 
onto the system stack, and the PC is loaded with the contents of the speci- 
fied trap vector (N). If the condition is not true, then continue normal opera- 
tion. 7 | 


ts) 


GIE Set to 0 if TRAP executes. 
LUF Unaffected. 
LV Unaffected. 
UF Unaffected. 
N Unaffected. 
Z = Unaffected. 
V Unaffected. 
C Unaffected. 


OVM Operation is not affected by OVM bit value. 
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TSTB Test Bit Fields 


Sinn Ap nal nnn nn aaa al pant al neat atin al aR alata sas A ANA AA nn na seal ataaat at alia alsa ial natal Satiaea tata Calen saalaaiaatetashaataat sahatacaalaeatdsatsialain\ataonutta’alaacataals’sta’acacaracataetaatasscntatatataatn’ para acncsealaratain a ater aesa ain taat aia nia aeatasa a acatecalatscatelacaanaeta nea aaca alan aceacarnsacnca’ aatetasalalatatnls nin atn®atatasaaacsca™atatst i alstaalalnes ats 
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Syntax — TSTB src, dst 
Operation dstAND src 


Operands src general addressing modes (G): 
00 register (any register in CPU primary jeulster file) 
01 direct 
10 — indirect 
11. immediate 


dst register (any register in CPU primary register file) 


Encoding 


31 24 23 16 15 87 | 
pepe redal a : 


Description The bitwise logical AND of the dst and srcoperands is formed, but the result 
is not loaded in any register. This allows for nondestructive compares. The 
dst and src operands are assumed to be unsigned integers. 


Cycles 1 


Status Bits These condition flags are modified for all destination registers. 
LUF Unaffected. 
LV Unaffected. 
UF 0. 
N  MSB of the output. 
Z 1 ifazero output is generated, 0 otherwise. 
V0. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 


| Example TSTB *-AR4(1),R5 


Before Instruction: 


AR4 = 80 99C5h 

R5 = 898h = 2200 

Data at 80 99C4h = 767h = 1895 

LUF LV UF NZVC=0 000000 


After Instruction: 
AR4 = 80 99C5h 
R5 = 898h = 2200 


Data at 80 99C4h = 767h = 1895 — 
LUF LV UF NZVC#0 000100 
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Test Bit Fields, 3 Operands TSTB3 
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Syntax TSTB3 src2, src 
Operation src1 & src2 


Operands src], src2___ both type 1 or type 2 three-operand addressing modes 
Encoding 


Type 1 
ot 24 23 16 15 87 0 
Type 2 
31 24 23 16 15 87 0 


6 


[00 | raiser mode (any GPU register) [register mode (any CPU register) 
[0% [indirect mode (sp = 0,7, 1RO, TT) [register mode (any CPU register) 


indirect mode (disp = 0, 1, IRO, IR1) indirect mode (disp = 0, 1, IRO, IR1) 


: | indirect mode *+ARn(5-bit unsigned 
01 | register mode (any CPU register) displacement) 
indirect mode *+ARn(5-bit unsigned — 
displacement) ( g 8-bit signed immediate 
14 indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) displacement) | 
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TSTB3 Test Bit Fields, 3 Operands _ 
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Description The bitwise logical AND between the src? and src2 aia is formed but 
is not loaded into any register. This allows for nondestructive compares. The 
src1 and src2 operands are assumed to be signed integers. 


Cycles | 
Status Bits LUF Unaffected. 
LV Unaffected. 
UF 0. 
N  MSB of the output. 
Z 1 if a zero output is generated, 0 otherwise. 
VO. | 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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Syntax 


Operation 


Operands 


Encoding 


31 24 23 16 15 


Bitwise ExclusiveOR XOR 


reser astnnstiesuirbmesapiepuesiusantiy peineinaenpiapionesieieny een wees espa 


XOR sre, dst 
dst XOR src —> dst 


src general addressing modes (G): 
00 register (any register in CPU primary register file) 
01 direct | 
10 indirect 
11 immediate 


dst register (any register in CPU primary register file) 


87 0 


Description The bitwise exclusive OR of the src and dst operands is loaded into the dst 


Cycles 
Status Bits 


Mode Bit 


Example 


register. The dst and src operands are assumed to be unsigned integers. 
1 


If ST (SETCOND) =0 and the destination register is RO — R11, the condition 
flags are modified. If ST (SETCON aa = 1, they are modified for all destina- 
tion registers. 

LUF Unaffected. 

LV Unaffected. 

UF 0. 

N MSB of the output. 

Z 1 if azero output is generated, 0 otherwise. 

VO. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
XOR R1,R2 | 
B Instruction: 


R1 = OF FA32h 
R2 = OF F5Cih 
LUF LV UF N ZV C#0 0 00 00 0 


After Instr tion: 


R1 = OF F412h 
R2 = 00 OFF3h 
LUF LV UF NZV C=0 0000 00 
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XOR3_Bitwise Exclusive OR, 3 Operands 


Syntax XOR3 src2, src, dst 
Operation src1 XOR src2 => dst 


Operands src1,src2__ both type 1 or type 2 three-operand addressing modes 


dst register mode (any register in CPU primary register file) 
Encoding 
Type 1 
31 24 23 16 15 87 0 
Type 2 
31 24 23 1615 87 0 


110000 


oO 
Oo 
ok 
EE 
o 
; ~~ 
: 
q 
wh, 
% 
CG) 
N 


Instruction Word Fields 


src1 addressing modes rc2 addressing modes 
register mode (any CPU register) register mode (any CPU register) 
1 | indirect mode (disp = 0, 1, IRO, IR1) register mode (any CPU register) 


Type 1 


0 register mode (any CPU register) indirect mode (disp = 0, 1, IRO, IR1) 


7 


awk, 


=| 2{2/8| 4 


indirect mode (disp = 0, 1, IRO, IR1) indirect mode (disp = 0, 1, IRO, IR1) 


sre addressing modes src2 addressing modes 
register mode (any CPU register) _——|_ 8-bit signed immediate | 


01 | register mode (any CPU register) she plneea ASAD OU ONee 
indirect mode *+ARn(5-bit unsigned ae 

displacement) 8-bit signed immediate 

indirect mode *+ARn1(5-bit unsigned _| indirect mode *+ARn2(5-bit unsigned 
displacement) displacement) 


| 


awh, adi, 
po Oo 
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Bitwise Exclusive OR, 3 Operands XOR3 


Description The bitwise exclusive OR between the src? and src2operands is loaded into 


Cycles 
Status Bits 


Mode Bit 


the dstregister. The src1, src2, and dstoperands are assumed to be signed 
integers. 


1 


lf ST (SETCOND) = 0 and the destination register is RO—R11, the condition 
flags are modified. If ST (SGETCOND) = 1, they are modified for all destina- 
tion registers. | 

LUF Unaffected. — 

LV Unaffected. 

UF 0. 

N  MSB of the output. 

Z 1 if azero output is generated, 0 otherwise. 

VO. 

C Unaffected. 


OVM Operation is not affected by OVM bit value. 
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XOR3||STI Parallel XOR3 and STI 
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Syntax XOR3 src2, src1, dst1 
[| STI src3, dst2 
Operation src1 XOR src2 => dstt 


\| stc3— dst2 


Operands src1__ register (RO — R7) 
Src2__ indirect (disp = 0, 1, IRO, IR1) 
dst? register (RO — R7) 
src3 register (RO — R7) 
dst2 indirect (disp = 0, 1, IRO, IR1) 


Encoding 


31 24 23 | 16 15 | 87 0 


Description A bitwise exclusive-XOR and an integer store are performed in parallel. All 
registers are read at the beginning and loaded at the end of the execute 
cycle. This means that if one of the parallel operations (STI) reads from a 
register and the operation being performed in parallel (XOR3) writes to the 
‘same register, then STI accepts as input the contents of the register before 
it is modified by the XOR3. 


If src2 and dst2 point to the same location, src2 is read before the write to 
dst2. | | 


_ Cycles | 


Siatus Bits LUF Unaffected. 
LV Unaffected. 
UF 0. 
N  MSB of the output. 
Z 1 if a zero output is generated, 0 otherwise. 
V0. 
C Unaffected. 


Mode Bit OVM Operation is not affected by OVM bit value. 
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Parallel XOR3 and STI _XOR3||S [STI 


Example 


XOR3 *AR1++,R3,R3 


| | 
STI R6, *-AR2 (IRO) 


Before Instruction: 

AR1 = 80 987Eh 

R3 = 85h 

R6 = ODCh = 220 

AR2 = 80 98B4h 

IRO = 8h 

Data at 80 987Eh = 85h 

Data at 80 98ACh = Oh 

LUF LV UF NZVC=#0 000000 


r Instruction: 


AR1 = 80 987Fh 

R3 = 0h 

R6 = ODCh = 220 

AR2 = 80 98B4h 

IRO = 8h 

Data at 80 987Eh = 85h 

Data at 80 98ACh = ODCh = 220 

LUF LV UF NZVC=0 00000 0 
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Chapter 12 


_Software | Applications 
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This chapter explains how to use the instruction set, the architecture, and 
the interface of the ’C40. It presents coding examples for frequently used 
applications and discusses more involved examples and applications. It 
also defines the principles involved in the application and gives the corre- 
sponding assembly-language code for instructional purposes and for imme- 
diate use. Whenever the detailed explanation of the underlying theory is too 
extensive to be included in this manual, appropriate references are given 
for further information. 


Major topics discussed in this chapter are listed below: 
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12.1 Processor Initialization 


12.1.1 Reset Process 


Before you execute a DSP algorithm, it is necessary to initialize the proces- 
sor. Generally, initialization takes place any time the processor is reset. 


When reset is activated by applying alow level to the RESET input for sever- 
al cycles, the ’C40 terminates execution and puts the RESET vector in the 
program counter. The RESET vector of ’C40 may be mapped to one of four 
different locations, controlled by the value of the RESETLOC(1,0) pins at 
RESET (shown in Table 12-1). The RESET vector normally contains the 
address of the system initialization routine. The hardware reset also initial- 
izes various registers and status bits (reset conditions are further defined 
in Section 6.6 on page 6-18). 


Table 12-1. Relationship of RESETLOC(1,0) Pins to RESET Vector Location 


RESETLOC(1,0) RESET Vector Address 
00 0000 0000h 
0 1 | 7FFF FFFFh 
10 8000 0000h 
11 FFFF FFFFh 


After reset, initialize the processor further by executing instructions that set 
up operational modes, memory pointers, interrupts, and the remaining 
functions needed to meet system requirements. 

12.1.2 Initialization 


To configure the processor at reset, the following internal functions should 
be initialized: 


Li CPU expansion register file 
[1 Memory-mapped registers 


CL} Interrupt structure 
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Processor Initialization 


Example 12-1 shows coding for initializing the ’C40 to the following ma- 
chine state, in addition to the initialization performed during the hardware 
reset (for conditions after hardware reset, see Section 6.6 on page 6-18 ): 


[J All interrupts are enabled. 

The program cache is enabled. 

The overflow mode is disabled. 

The data memory page pointer is set to zero. 
The stack pointer is set to internal RAM address 00O2FFFOOH 
The internal memory is filled with zeros. 


ood do Oo 


Note that all constants larger than 16 bits should be placed in memory and 
accessed through direct or indirect addressing. 


Example 12-1. Processor Initialization Example 


* 


* TITLE ‘PROCESSOR INITIALIZATION EXAMPLE’ 

* 
.global RESET, INIT, BEGIN 
.global TIMEO,TIME1,TINTO, TINT1 
.global NMI,INTO, INT1, INT2, INT3 
.global NON MASK, ISRO,ISR1,ISR2,ISR3 
.global ICFULLO, ICRDY0, OCRDYO, OCEMPTYO 
.global ICFSRO,ICRSRO, OCRSRO, OCESRO 
global ICFULL1,ICRDY1,OCRDY1,OCEMPTY1 
.global ICFSR1,ICRSR1,OCRSR1,OCESR1 
.global ICFULL2,ICRDY2,OCRDY2,OCEMPTY2 
.global ICFSR2,ICRSR2,OCRSR2,OCESR2 
global ICFULL3, ICRDY3,OCRDY3,OCEMPTY3 
.global ICFSR3, ICRSR3,OCRSR3, OCESR3 
.global ICFULL4, ICRDY4, OCRDY4, OCEMPTY4 
.global ICFSR4,ICRSR4,OCRSR4,OCESR4 
global ICFULL5, ICRDY5, OCRDY5, OCEMPTYS5 
global ICFSR5,ICRSR5,OCRSR5,OCESR5 
global DINTO,DINT1,DINT2, DINT3,DINT4,DINT5 
.global DMAO,DMA1,DMA2,DMA3,DMA4,DMAS 
.global TRAPO, TRAP1, TRAP2, TRAP3, TRAP4, TRAPS 
.global TRPO,TRP1,TRP2, TRP3,TRP4,TRP5 
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Example 12-1. Processor Initialization Example (Continued) 


* 
. 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 


NMI 


TINTO 
* 
INTO 
INT1 
INT2 
INT3 


* 


ICFULLO 


ICRDYO 
OCRDYO 


OCEMPTYO 
* 


ICFULL1 


ICRDY1 
OCRDY1 


OCEMPTY1 


* 


ICFULL2 


ICRDY2 
OCRDY2 


OCEMPTY2. 


* 


ICFULL3 


ICRDY3 
OCRDY3 


OCEMPTY3 


PROCESSOR INITIALIZATION FOR THE TMS320C40. 


RESET AND INTERRUPT VECTOR SPECIFICATION. THIS ARRANGEMENT 
ASSUMES THAT DURING LINKING, THE FOLLOWING TEXT SEGMENT 
WILL BE PLACED TO START AT MEMORY LOCATION AS: 


SEGMENT NAME | 


reset_adr 
int_vect 
trap_vect 
.data 
~text 


+ 
| 
| 
| 
| 
| 


MEMORY LOCATION 
SAME AS RESETLOC (1,0) SETUP 
00000200h 
00000400h 
00000500h 
00000600h 


NOTE THAT THE INTERRUPT AND TRAP VECTORS TABLE CAN BE 
RELOCATED TO A 512-WORD BOUNDARY BY CHANGING THE VALUES 
OF THE IVTP AND TVTP. 


-sect 
.word 


-sect 


.- space 
-word 


.word 


-word 
.word 
-word 
.word 
.-space 


.word 
-word 
-word 
-word 


-word 
.word 
.word 
-word 


-word 
-word 
-word 
-word 


.word 
word 
.word 
-word 


“reset_adr” 


INIT 


“int_vect” 


1 
NON MASK 


TIMEO 


ISRO 

ISR1 

ISR2 

ISR3 
6 


ICFSRO 


ICRSRO 
OCRSRO 
OCESRO 


ICFSR1 
ICRSR1 
OCRSR1 
OCESR1 


ICFSR2 
ICRSR2 
OCRSR2 
OCESR2 


ICFSR3 
ICRSR3 
OCRSR3 
OCESR3 


me Ye We We we Ve We We We we We Yo We We Wo we Ve 


we We Be We we Ve Ve Vo 


~™e Ye Re We 


Named section for RESET vector 
RS-load address INIT to PC 


Named section for interrupt 
structures 
Reserved space 

Non Maskable Interrupt NMI-loads 
address NMI to PC 

0 interrupt processing 


Timer 


INTO- 
INT1- 
INT2- 
INT3— 


loads 
loads 
loads 


loads. 
Reserved space 


port 
port 
port 
port 


port 
port 
port 
port 


port 
port 
port 
port 


‘port 


port 
port. 
port 


OOOO 


WW WW NON ND dD RPrRrEH 


address 
address 
address 
address 


INTO to 
INT1 to 
INT2 to 
INT3 to 


PC 
PC 
PC 
PC 


input full processing 
input ready processing 
output ready processing 


output empty processing 


input full processing 

input ready processing 
output ready processing 
output empty processing 


input full processing 

input ready processing 
output ready processing 
output empty processing 


input full processing 

input ready processing 
output ready processing 
output empty processing 
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Example 12-1. Processor Initialization Example (Continued) 
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ICFULL4 .word ICFSR4 ; Comm. port 4 input full processing 
ICRDY4 -word ICRSR4 ; Comm. port 4 input ready processing 
OCRDY 4 -word OCRSR4 ; Comm. port 4 output ready processing 
OCEMPTY4 .word OCESR4 ; Comm. port 4 output empty processing 
* 
ICFULL5 .word ICFSR5 ; Comm. port 5 input full processing 
ICRDY5 -word ICRSR5 ; Comm. port 5 input ready processing 
OCRDY5 -word OCRSR5 ; Comm. port 5 output ready processing 
OCEMPTY5 .word OCESR5 ; Comm. port 5 output empty processing 
* 
DINTO -word DMAO ; DMA Channel O interrupt 
DINT1 -word DMA1 ; DMA Channel 1 interrupt 
DINT2 -word DMA2 ; DMA Channel 2 interrupt 
DINT3 -word DMA3 ; DMA Channel 3 interrupt 
DINT4 .word DMA4 ; DMA Channel 4 interrupt 
DINTS5 -word DMA5 ; DMA Channel 5 interrupt 
TINT1 .word TIME1 ; Timer 1 interrupt processing 

-space 21 ; Reserved space 
* 

-sect “trap _vect” ; Named section for trap structures 
TRAPO -word TRPO ; Trap 0 vector processing begins 
TRAP1 -word TRP1 ; Trap 1 vector processing begins 
TRAP2 -word TRP2 ; Trap 2 vector processing begins 
TRAP 3 -word TRP3 ; Trap 3 vector processing begins 
TRAP 4 word TRP4 ; Trap 4 vector processing begins 
TRAP 5 -word TRP5 ; Trap 5 vector processing begins 

-Space 506 ; Leave space for the other 506 traps 
* 
* IN THIS SECTION, CONSTANTS THAT CANNOT BE REPRESENTED 
* IN THE SHORT FORMAT ARE INITIALIZED. 
-data 
MASK word OFFFFFFFFH 
BLKO -word O2FF800H ; Beginning address of RAM block 0 
BLK1 -word O2FFCO0OH ; Beginning address of RAM block 1 
CTRL -word 0100000H ; Pointer for peripheral-bus memory map 
GLOINT -word 0Q0Q00000H ; Init of global memory interface 

: ; control (0) 
LOCALINT .word QOOO0O00H ; Init of local memory interface 

| ; control (4) 

DMAOCTL .word QOQ0QQ000H ; Initialization for DMA 0 control (160) 
DMAICTL .word QO0Q0000H ; Initialization for DMA 1 control (176) 
DMA2ZCTL .word 0Q000000H ; Initialization for DMA 2 control (192) 
DMA3SCTL .word Q0Q00000H ; Initialization for DMA 3 control (208) 
DMA4CTL .word O0O00000H ; Initialization for DMA 4 control (224) 
DMA5CTL .word 0Q000000H ; Initialization for DMA 5 control (240) 
CPOCTL -word Q0O0O0000H ; Init of comm. port QO control (64) 
CP1CTL -word Q0O00000H ; Init of comm. port 1 control (80) 
CP2CTL -word Q000000H ; Init of comm. port 2 control (96) 
CP3CTL -word Q000000H ; Init of comm. port 3 control (112) 
CP4CTL .word Q0Q00000H ; Init of comm. port 4 control (128) 
CP5CTL -word Q0O00000H ; Init of comm. port 5 control (144) 
TIMOCTL .word 0000000H ; Initialization of timer 0 control (32) 
TIMICTL .word ; Initialization of timer 1 control (48) 
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Example 12-1. Processor Initialization Example (Continued) 


ANACTL .word OOOO0000H ; Initializationofanalysismodule (16) 
SUCK .word O2FFFOOH ; Beginningof stack 

* text 

* 

* THE ADDRESS AT RESET VECTOR DIRECTS EXECUTION TO BEGIN HERE 
* FOR RESET PROCESSING THAT INITIALIZES THE PROCESSOR. WHEN 

is RESET IS APPLIED, THE FOLLOWING REGISTERS ARE INITIALIZED 

* TO ZERO: 

* 

* ST -- CPU STATUS REGISTER 

* DIE —- DMA INTERRUPT ENABLE REGISTER 

* IIE -- INTERNAL INTERRUPT ENABLE REGISTER 

* LIF -- IIOF PINS AND INTERRUPT FLAG REGISTER 

* 

* IVTP ~~ INTERRUPT-VECTOR TABLE POINTER 

* TVTP -— TRAP-VECTOR TABLE POINTER 

* : 


+ 


THE STATUS REGISTER HAS THE FOLLOWING ARRANGEMENT: 

Bil ot a) 16 LS 14 13 12° 11-20 

FUNCTION: RESRV ANALYSIS SET  PGIE GIE CC CE CE 
LDLE: COND | 


+ 


* BITS: 9 8 7 6 5 4 aS Z dv 

* FUNCTION: PCF RM OVM | LUE “LV UF N 24 V G 

* 

INIT LDPX MASK ; Point the DP register to page 0 
LDI LSO00H; ST ; Clear and enable cache,and 


; disable OVM. 
* SET UP IVTP AND TVTP TO 200H AND 400H 


LoL 0200H, ARO. 
LDPE ARO,IVTP 
ADDI 0200H,ARO 
LDPE ARO,TVTP 
EDI @MASK, IE 


Set Primary Register ARO to 200H 
Set Expansion Register IVTP to 200H 
Set Primary Register ARO to 400H 
Set Expansion Register TVTP to 400H 
Enable all interrupts 


we Neo Ne Ne Ne 


+ 


INTERNAL DATA MEMORY INITIALIZATION TO FLOATING POINT ZERO 


LDI @BLKO, ARO 
LDI @BLK1,AR1 
LDF 0.:07-RO 

RPTS 1023 

STE RO, *ARO++4 (1) 
oTE RO, *ARL++ (1) 


ARO points to block 0 

AR1 points to block 1 

Zero register RO 

Repeat 1024 times 

Zero out location in RAM block O and 
Zero out location in RAM block 1. 


“we ‘Ne Ne Te Ve Neo 


THE PROCESSOR IS INITIALIZED. THE REMAINING APPLICATION-— 
DEPENDENT PART OF THE SYSTEM (BOTH ON- AND OFF-CHIP SHOULD 
NOW BE INITIALIZED. 


FIRST, INITIALIZE THE CONTROL REGISTERS. IN THIS EXAMPLE, 
EVERYTHING IS INITIALIZED TO ZERO SINCE THE ACTUAL 
INITIALIZATION IS APPLICATION DEPENDENT. 


++ + + FF HK 
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Example 12-1. Processor Initialization Example (Concluded) 


LDI @CTRL, ARO ; LOAD in ARO the pointer to control 
; registers 
LDI @GLOINT, RO 


STI RO, *ARO ; Init global memory interface control 
LDI @LOCALINT, RO . 
STI RO, *+ARO (4) ; Init local memory interface control 
LDI @DMAOCTL, RO 
STI RO, *+ARO (160) ; Init DMA 0 control 
LDI @DMA1LCTL, RO 
STI RO, *+ARO (176) ; Init DMA 1 control 
LDI @DMA2CTL, RO | 
STI RO, *+ARO (192) ; Init DMA 2 control 
LDI @DMA3CTL, RO _ 
STI RO, *+ARO (208) ; Init DMA 3 control 
LDI Q@DMA4CTL, RO 
STI RO, *+ARO (224) ; Init DMA 4 control 
LDI @DMAS5SCTL, RO 
STI RO, *+ARO (240) ; Init DMA 5 control 
LDI @CPOCTL, RO 
STI RO, *+ARO (64) ; Init communication port 0 control 
LDI » @CP1CTL, RO 
STI RO, *+ARO (80) ; Init communication port 0 control 
LDI @CP2CTL, RO 
STI RO, *+ARO (96) ; Init communication port 0 control 
LDI @CP3CTL, RO 
STI RO, *+ARO (112) ; Init communication port 0 control 
LDI @CP4CTL, RO 
STI RO, *+ARO (128) ; Init communication port 0 control 
LDI @CP5CTL, RO 
STI RO, *+ARO (144) ; Init communication port 0 control 
LDI @TIMOCTL, RO 
STI RO, *+ARO (32) ; Init timer 0 control 
LDI @TIMICTL, RO 
STI RO, *+ARO (48) ; Init timer 1 control 
LDI @ANACTL, RO | 
STI RO, *+ARO (16) ; Init analysis module control 
* 
LDI @STCK, SP  ¢ Initialize the stack pointer 
OR 2000H, ST ; Global interrupt enable 
* 
BR BEGIN _ ; Branch to the beginning of 


; application. 
.end 
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12.2 Program Control 


TMS320C40 instructions provide program control and facilitates high- 
speed processing. These instructions directly handle: 


Regular and zero-overhead subroutine calls 
Software stack 
Interrupts 


Occ da 


Delayed branches 
C1 Single- and multiple-instruction loops without any overhead 


12.2.1 Subroutines 


The’C40 provides two ways to invoke the subroutine calls: regular and zero- 
overhead. The regular and zero-overhead subroutine calls use software 
stack (SP) and extended-precision register R11 respectively to save the re- 
turn address. The following subsections use example programs to explain 
how this works. 


12.2.1.1 Regular Subroutine Calls 


The ’C40 has a 32-bit program counter (PC) and a practically unlimited soft- 
ware stack. The CALL and CALLcond subroutine calls cause the stack 
pointer to increment and store the contents of the next value of the PC 
counter on the stack. At the end of the subroutine, RETScond performs a 
conditional return. 


Example 12-2 illustrates the use of a subroutine to determine the dot prod- 
uct between two vectors. Given two vectors of length N, represented by the 
arrays a{0], a[1], ..., a[N—1] and b(0], b[1],..., b[N—1], the dot product is com- 
puted from the expression | 7 


d = a[0] b[0] + a[1] b[1] +... + alN—1] bIN—1] 


Processing proceeds in the main routine to the point where the dot product 
is to be computed. It is assumed that the arguments of the subroutine have 
been appropriately initialized. At this point, a CALL is made to the subrou- 
tine, transferring control to that section of the program memory for execu- 
tion, then returning to the calling routine via the RETS instruction when ex- 
ecution has completed. Note that for this particular example, it would suffice 
to save the register R2. However, a larger number of registers are saved for 
demonstration purposes. The saved registers are stored on the system 
stack, which should be large enough to accommodate the maximum antici- 
pated storage requirements. Other methods of saving registers could be 


used equally well. 
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Example 12-2. Regular Subroutine Call (Dot Product) 


* 
* TITLE REGULAR SUBROUTINE CALL (DOT PRODUCT) 
* 
x 4 
* MAIN ROUTINE THAT CALLS THE SUBROUTINE ‘DOT’ TO COMPUTE THE 
* DOT PRODUCT OF TWO VECTORS. 
LDI @b1k0, ARO ; ARO points to vector a 
LDI @b1k1,AR1 ; AR1 points to vector b 
LDI N, RC ; RC contains the number of elements 
CALL DOT 
* 


*SUBROUTINE DOT 
* 


* 


*EQUATION: d= a(0) * b(0) + a(1) * b(1) +... + a(N-1) * b(N-1) 
* 

*THE DOT PRODUCT OF a AND b IS PLACED IN REGISTER RO. N MUST 

*BE GREATER THAN OR EQUAL TO 2. 


* 
*. ARGUMENT ASSIGNMENTS: 
* ARGUMENT | FUNCTION 
* i i id ad oes Seas se fe an a oe 
* ARO | ADDRESS OF a(0) 
* AR1 | ADDRESS OF b(0) 
* RC | LENGTH OF VECTORS (N) 
4 | 
* REGISTERS USED AS INPUT: ARO, AR1, RC 
* REGISTER MODIFIED: RO 
* REGISTER CONTAINING RESULT: RO 
* 
* 
* 
-global DOT 
* 
DOT PUSH ST ; Save status register 
PUSH R2 ; Use the stack to save R2’s 
PUSHF R2 ; bottom 32 and top 32 bits 
PUSH ARO ; Save ARO 
PUSH AR1 ; Save ARI 
PUSH RC ; Save RC 
PUSH RS 
PUSH RE 
* | ; Initialize RO: 
| MPYF3 *ARO, *AR1,RO; a(0) * b(Q) -> RO 
| | SUBF R2.R2,R2 ; Initialize R2. 
; Set RC = N-2 


SUBI 2,RC 
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* DOT PRODUCT (1 <= i < N) 


RPTS RC ; Setup the repeat single. 
MPYF3 *++AR0(1),*++AR1(1),R0 ; a(i) * b(i) -> RO 
| | ADDF3 RO,R2,R2 3; a(i-1)*b(i-1) + R2 -> R2 
* 
ADDF3 RO,R2,RO ; a(N-1)*b(N-1) + R2 -> RO 
* 
* RETURN SEQUENCE 
* 
POP RE 
POP RS 
POP RC ; Restore RC 
POP AR1 ; Restore AR1 
POP ARO ; Restore ARO 
POPF R2 ; Restore top 32 bits of R2 
POP R2 ; Restore bottom 32 bits of R2 
POP ST ; Restore ST 
RETS ; Return 
* 
mn end 
* 
end 


12.2.1.2 Zero-Overhead Subroutine Calls 


Two ’C40 instructions, link and jump (LAJ) and link and jump conditional 
(LAJcond), allow zero-overhead subroutine calls to be implemented on the 
‘C40. Unlike the CALL and CALLcond which put the value of PC+1 into the 
software stack, the LAJ and LAJcond put the value of PC+4 into the exten- 
ded-precision register R11. Three instructions following LAJ or LAJcondwill 
be executed before going to the subroutine. The restriction of these three 
instructions is the same as that of the three instructions following a delayed 
branch. Atthe end of the subroutine, adelayed branch conditional, BconaD, 
using the register addressing mode with R11 as source, can be used to per- 
form a zero-overhead subroutine return. 


For comparison, the same dot product example with zero-overhead subrou- 
tine call is given in the following example program. 
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Example 12-3. Zero-Overhead Subroutine Call (Dot Product) 


* ; 
* TITLE ZERO-OVERHEAD SUBROUTINE CALL (DOT PRODUCT) 
* ‘ 
* i 
* MAIN ROUTINE THAT CALLS THE SUBROUTINE ‘DOT’ TO COMPUTE THE 
* DOT PRODUCT OF TWO VECTORS. 
LAJ DOT 
LDI @b1k0, ARO ; ARO points to vector a 
LDI @b1k1,AR1 ; AR1 points to vector b 
LDI N, RC ; RC contains the number of elements 
= ‘ 
* SUBROUTINE DOT 
* 
*EQUATION: d= a(0O) * b(O) + a(1l) * b(1) + ..-. + a(N-1) * BD(N-1) 
: 
* THE DOT PRODUCT OF a AND b IS PLACED IN REGISTER RO. N MUST 
* BE GREATER THAN OR EQUAL TO 2. 
* 
* ARGUMENT ASSIGNMENTS: 
* ARGUMENT | FUNCTION 
* ies is oats ‘otcs Sen os pee ee 
* ARO | ADDRESS OF a(0) 
* AR1 | ADDRESS OF b(0) 
* RC | LENGTH OF VECTORS (N) 
* 
* REGISTERS USED AS INPUT: ARO, AR1, RC 
* REGISTER MODIFIED: RQ 
* REGISTER CONTAINING RESULT: RO 
* 
* 
* 
-global DOT 
* 
DOT PUSH ST ; Save status register 
PUSH R2 3 Use the stack to save R2’s 
PUSHF R2 ; bottom 32 and top 32 bits 
PUSH ARQ ; Save ARO 
PUSH AR1 ; Save ARI 
PUSH RC ; Save RC 
PUSH RS 
: PUSH RE 
* ; Initialize RO: 
MPYF3 *ARQO, *AR1, RO ; a(O) * b(O) -> RO 
| | SUBF R2.R2,R2 ; Initialize R2. 
SUBI 2,RC ; Set RC = N-2 
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* DOT PRODUCT (1 <= i < N) 
* 
RPTS RC Setup the repeat single 
MPYF3 *++ARO(1),*++AR1 (1), RO ; a(i) * b(i) -> RO 
| | ADDF3 RO,R2,R2 3; a(i-1)*b(i-1) + R2 -> R2 
* 
ADDF3 RO,R2, RO ; a(N-1) *b(N-1) + R2 -> RO 
* 
* RETURN SEQUENCE 
* 
POP RE. 
POP RS 
POP RC ; Restore RC. 
POP AR1 ; Restore ARI 
POP ARO ; Restore ARO 
BUD R11 ; Return 
POPF R2 ; Restore top 32 bits of R2 
POP R2 ; Restore bottom 32 bits of R2 
POP ST ; Restore ST 
* 
. end 
* 
-end 


12.2.2 Software Stack 


Location of the ’C40 software stack is determined by the contents of the 
stack pointer register (SP). The stack pointer increments from low to high 
values, and provisions should be made to accommodate the anticipated 
storage requirements. The stack can be used not only during the subroutine 
CALL and RETS, but also inside the subroutine as a place of temporary stor- 
age of the registers as shown in Example 12-2. SP always points to the last 
value pushed onto the stack. 


The CALL and CALLcond instructions push the value of the program count- 
er onto the stack, as do the interrupt routines. Then, RETScond and 
RETicond pop the stack and place the value in the program counter. The 
integer value of any register can be pushed onto and popped off the stack 
-by using the PUSH and POP instructions. 


Two additional instructions, PUSHF and POPF, are for floating-point num- 
bers. These instructions can be used to pop and push floating-point num- 
bers to registers RO — R11. This feature is very useful for saving the ex- 
tended precision registers (see Example 12—2 and Example 12-3). Youcan 
use PUSH and PUSHF on the same register to save the lower 32 and upper 
32 bits. PUSH saves the lower 32 bits; PUSHF, the upper 32 bits. To recover 
this extended-precision number, execute a POPF followed by POP. It is im- 
portant to do the integer and floating-point PUSH and POP in the above or- 
der. POPF forces the last eight bits of the extended-precision registers to 


zero. 120 
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The stack pointer (SP) can be read as well as written to. Multiple stacks for 
different program segments may be easily created. SP is not initialized by 
the hardware during reset; therefore, it is important to remember to initialize 
its value so that SP points to a predetermined memory location. This avoids 
the problem of SP attempting to write into ROM or write over other useful 
data. | 


12.2.3 Interrupt Service Routines 
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There are two types of interrupts on the ’C40: maskable and nonmaskable. 
The maskable interrupts include internal and external interrupts. All the in- 
terrupts are vectored and prioritized. The vector table for the various inter- 
rupts is located in relation to the interrupt-vector table pointer (IVTP, shown 
in Section 3.2 on page 3-15). The nonmaskable interrupt (NMI) has the high- 
est priority over other interrupts. Unlike other interrupts, the NMI cannot be 
masked by its own mask or by the GIE bit in the ST. It is temporarily masked 
during delayed branches and multicycle CPU operation. 


When an interrupt occurs, the corresponding flag is set in the interrupt flag 
register (IIF — explained in subsection 3.1.10, page 3-12). For nonmask- 
able interrupt, if the corresponding NMI flag is set, NMI begins the interrupt 
processing, as long as the CPU Is not executing delayed branches or multi- 
cycle operation. For maskable interrupts, in order to respond to the interrupt 
when the corresponding interrupt flag is set, the GIE bit in the ST must be 
set to enable maskable interrupts globally, and the corresponding bit in the 
interrupt enable register (IIE — described in subsection 3.1.9, page 3-10) 
or IIF register (for external interrupts) must be set also. Since pins 
I1OF(3 — 0) can be either general-purpose I/O or external interrupt pins, 
you must configure (using IIF register) those pins as interrupt pins to enable 
an external interrupt. Also, ifthe IIOF(3 — 0) pins are configured as interrupt 
pins, they can be configured (also at IIF register) as either edge-triggered 
or level-triggered interrupts. You can also write to the IIF register, making 
it possible to force an interrupt by software or to clear interrupts without pro- 
cessing them. | 


The interrupt flag register can be read, and action can be taken, depending 


_ on whether the interrupt has occurred. This is true even when the maskable 


interrupt is disabled. This can be useful when an interrupt-driven interface 
is not implemented. Example 12—4 shows the case in which a subroutine 
is called when external interrupt 1 has not occurred. 
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Example 12-4. Use of Interrupts for Software Polling 


* TITLE INTERRUPT POLLING 
TSTB 40H,IIF ; Test if interrupt 1 has occurred 


CALLZ SUBROUTINE ; If not, call subroutine 


When interrupt processing begins, the program counter is pushed on the 
stack, and the interrupt vector is loaded in the program counter. Interrupts 
are then disabled by setting GIE=0, and the program continues from the ad- 
dress loaded in the program counter. Since all maskable interrupts are dis- 
abled, interrupt processing may proceed without further interruption unless 
the interrupt service routine re-enables interrupts, or the NMI occurs. 


Except for very simple interrupt service routines, it is important to assure 
that the processor context is saved during execution of this routine. The 
context must be saved before you execute the routine itself, and it must be 
restored after the routine is finished. The procedure is called context switch- 
ing. Context switching is also useful for subroutine calls, especially when 
extensive use is made of the auxiliary and the extended-precision registers. 
Code examples of context switching and an interrupt service routine are 
provided in this section. 


12.2.3.1 Context Switching 


Context switching is commonly required when processing a subroutine call 
or interrupt. It may be quite extensive or simple, depending on system re- 
quirements. For the ’C40, the program counter is automatically pushed onto 
the stack. Important information in other’C40 registers, such as the status, 

auxiliary, or extended-precision registers must be saved by special com- 
mands. The status register should be saved first and restored last in order 
to preserve the processor status without any further change caused by other 
context-switching instructions. 
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Example 12-4 and Sanpledes: show saving and restora of the ’'C40 
state. In both examples, the stack is used for saving the registers, and it ex- 
pands towards higher addresses. If you don’t want to use the stack pointed 
at by the SP, you can create a separate stack by using an auxiliary register 
as the stack pointer. Registers saved in these examples: 


Status register (ST) — should be saved first and restored last 
Extended-precision registers RO through R11 

Auxiliary registers ARO through AR7 

Data-page pointer (DP) 

Index registers (IRO and IR1) 

Block-size register (BK) 

Interrupt-related registers IIE, IIF, and DIE 


oooogoaogaoaa 


Repeat-related registers RS, RE, and RC 
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Example 12-5. Context-Save for the TMS320C40 


XD HHH KH 


* 


eR NN Na 


TITLE CONTEXT-SAVE FOR THE TMS320C40 


-global 


SAVE 


CONTEXT SAVE ON SUBROUTINE CALL OR INTERRUPT. 


PUSH 


ST 


° 
? 


Save status register 


SAVE THE EXTENDED PRECISION REGISTERS 


PUSH 
PUSHF 
PUSH 
PUSHF 
PUSH 
PUSHF 
PUSH 
PUSHFE 
PUSH 
PUSHF 
PUSH 
PUSHF 
PUSH 
PUSHF 
PUSH 
PUSHF 
PUSH 
PUSHF 
PUSH 
PUSHF 
PUSH 
PUSHF 
PUSH 
PUSHF 


SAVE THE AUXILIARY REGISTERS 


PUSH 
PUSH 
PUSH 
PUSH 
PUSH 
PUSH 
PUSH 
PUSH 


SAVE 


PUSH 
PUSH 
PUSH 
PUSH 
PUSH 
PUSH 


THE REST REGISTERS FROM 


we Ve We Ve We Ve We We We We We We We We We Re Veo We Ve Woe We We Vo Ws 


we Me Ne Me Ve Me Ve Ne 


we we We We We Vo 


Save the lower 32 bits 
and the upper 32 bits 
Save the lower 32 bits 
and the upper 32 bits 
Save the lower 32 bits 
and the upper 32 bits 
Save the lower 32 bits 
and the upper 32 bits 
Save the lower 32 bits 
and the upper 32 bits 
Save the lower 32 bits 
and the upper 32 bits 
Save the lower 32 bits 
and the upper 32 bits 
Save the lower 32 bits 
and the upper 32 bits 
Save the lower 32 bits 
and the upper 32 bits 
Save the lower 32 bits 
and the upper 32 bits 
Save the lower 32 bits 
and the upper 32 bits 
Save the lower 32 bits 
and the upper 32 bits 


Save ARO 
Save AR1 
Save AR2 
Save AR3 
Save AR4 
Save AR5 
Save AR6 
Save AR7 


THE REGISTER FILE 


Save data page pointer 
Save 
save 
Save block-size regist 


Save interrupt enable register 
interrupt flag register 


Save 
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of 
of 
of 
of 
of 
of 
of 
of 
of 
of 
of 


of 


index register IRO 
index register IR1 


er 


RO 
R1 
R2 
R3 
R4 
R5 
R6 


R7 


R8 


R9 
R10 


Rll 
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PUSH DIE ; Save DMA interrupt enable register 
PUSH RS 3; Save repeat start address 

PUSH RE ; Save repeat end address 

PUSH RC ; Save repeat counter 


* 


SAVE IS COMPLETE 


Example 12-6. Context-Restore for the TMS320C40 


* 

* TITLE CONTEXT-RESTORE FOR THE TMS320C40 

* 

* -global RESTR 

* 

* CONTEXT RESTORE AT THE END OF A SUBROUTINE CALL OR INTERRUPT. 

RESTR: | | | 

_ * 
ee) RESTORE THE REST REGISTERS FROM THE REGISTER FILE 

* 
POP RC ; Restore repeat counter 
POP RE ; Restore repeat end address 
POP RS ; Restore repeat start address 
POP DIE ; Restore DMA interrupt enable register 
POP IIF ; Restore interrupt flag register 
POP IIE ; Restore interrupt enable register 
POP BK ; Restore block-size register 
POP IR1 ; Restore index register IR1 
POP IRO ; Restore index register IRO 
POP DP +; Restore data page pointer 

* 

* RESTORE THE AUXILIARY REGISTERS 

* 
POP AR7 ; Restore AR7 
POP AR6 ; Restore AR6 
POP ARS ; Restore AR5 
POP AR4 ; Restore AR4 
POP AR3 ; Restore AR3 
POP AR2 ; Restore AR2 
POP AR1 ; Restore AR1 
POP ARO ; Restore ARO 

* 

* RESTORE THE EXTENDED PRECISION REGISTERS 

* 
POPF R11 ; Restore the upper 32 bits and 
POP R11 ; the lower 32 bits of R11 
POPF R10 ; Restore the upper 32 bits and 
POP ‘R10 ; the lower 32 bits of R10 
POPF RQ ; Restore the upper 32 bits and 
POP RQ ; the lower 32 bits of R9 
POPF R8 ; Restore the upper 32 bits and 
POP R8 ; the lower 32 bits of R8 
POPF R7 ; Restore the upper 32 bits and 
POP R7 ; the lower 32 bits of R7 
POPF R6 ; Restore the upper 32 bits and 
POP R6 ; the lower 32 bits of R6 


12-18 Software Applications 


Program Control — ~ Interrupt. Service Routines 


7 satan tat atatatat tte atatonateletntetottutasanataeatetetataletetinesesselasasnenSteseatetatenanetOea vt siateaanananeotetulutona oatatanshonnututelueatanatetdtulateteroteteusatetesaneneestaateseseteretetet Sas Seen aS aod 


POPF RS ; Restore the upper 32 bits and 
POP R5 ; the lower 32 bits of R5 

POPF R4 ; Restore the upper 32 bits and 
POP R4 ; the lower 32 bits of R4 

POPF R3 ; Restore the upper 32 bits and 
POP R3 ; the lower 32 bits of R3 

POPF R2 ; Restore the upper 32 bits and 
POP R2 ; the lower 32 bits of R2 

POPF R1 ; Restore the upper 32 bits and 
POP Rl ; the lower 32 bits of Rl 

POPF RO ; Restore the upper 32 bits and 
POP RO ; the lower 32 bits of RO 

POP ST ; Restore status register 


* 


RESTORE IS COMPLETE 


12.2.3.2 Interrupt-Vector Table 


The interrupt-vector table (IVT, shown in Figure 3-8 on page 3-16) of the 
‘C40 is relocatable. The location of the IVT is relative to the interrupt-vector 
table pointer (IVTP). The IVTP is a 32-bit expansion register that points to 
the base address of the IVT. Since the IVT is required to lie on a 512-word 
boundary, the 9 LSBs of the IVTP should always be zero. The two instruc- 
tions, LDEP and LDPE, read from and write to the expansion registers, IVTP 
and trap-vector table pointer (TVTP). Example 12-6 shows how to change 
the value of the IVTP (it is similar to changing the value of the TVTP). With 
this relocatable feature, an interrupt signal can be used for different ser- 
vices. In Example 12-7, the IVTP is reset in the external INTO interrupt ser- 
vice routines EINTOA and EINTOB. After the value of the IVTP is changed, 
CPU will go to a different interrupt service routine when the same interrupt 
signal occurs again. 
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Example 12-7. Use of One Interrupt Signal for Two Different Services 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 


EINTOA: 


TITLE USE OF ONE INTERRUPT SIGNAL FOR TWO DIFFERENT SERVICES 


IN THIS EXAMPLE, THE ADDRESS OF EINTOA AND EINTOB ARE IN 
MEMORY LOCATION 03H AND 1003H RESPECTIVELY. ASSUMING THE IVTP 
HAS NOT BEEN CHANGED AFTER DEVICE RESET AND THE EXTERNAL 
INTERRUPT IIOFO IS ENABLED. WHEN THE FIRST IIOFO INTERRUPT 
SIGNAL COMES IN, THE EINTOA ROUTINE WILL BE EXECUTED. AND THEN 
IF THE NEXT IIOFO INTERRUPT SIGNAL OCCURS, THE EINTOB ROUTINE 
WILL BE EXECUTED, AND SO ON. THE EINTOA AND EINTOB ROUTINES 
WILL TAKE TURN TO BE EXECUTED WHEN IIOFO INTERRUPT SIGNAL 
OCCURS. 


External IIOFO interrupt service routine A 


-global EINTOA 


LDI 1000H, RO ; Change IVTP to point to 1000H 
LDPE RO, IVTP . 


* 


RETI ; Return and enable interrupts 


External IIOFO interrupt service routine A 


t+ + + F 


-global EINTOB 
EINTOB: 


LDI 0,RO ; Change IVTP to point to 0 
LDPE RO, IVTP 

rs | e 
RETI ; Return and enable interrupts 


12.2.3.3 Interrupt Priority 


Interrupts on the ‘C40 are automatically prioritized. This: allows interrupts 
that occur simultaneously to be serviced in a predefined order. Infrequent, 
but lengthy, interrupt service routines may need to be interrupted by more 
frequently occurring interrupts. Since the GIE bit in ST is reset when the in- 
terrupt vector is taken, this nesting interrupt will occur only if it is the NMI in- 
terrupt or if the interrupt is re-enabled in the interrupt service routine. 


42-20 3 | Software Applications 


__Program Control — Interrupt Service Routines 


In Example 12-8, the interrupt service routine for INT2 temporarily modifies 
the interrupt enable register (IIE) and interrupt flag register (IIF) to permit 
interrupt processing when an interruptto INTO or NMI (but no other interrupt) 
occurs. When the routine has finished processing, the IIE register is re- 
stored to its original state. Notice that the RETIcond instruction not only 
pops the next program counter address from the stack, but also restores 
GIE and CF bits from the PGIE and PCF bits. This re-enables all interrupts 
that were enabled before the INT2 interrupt was serviced. 


Example 12-8. Interrupt Service Routine 


* 


ISR2: 


TITLE INTERRUPT SERVICE ROUTINE 


-global ISR2 


ENABLE ~set 2000h 
MASK -set 9h 


INTERRUPT PROCESSING FOR EXTERNAL INTERRUPT INT2- 


PUSH ST ; 
PUSH DP ; 
PUSH IIE ; 
PUSH LIF 

PUSH RO ; 
PUSHF RO ; 
PUSH Rl ; 
PUSHF Rl . 
LDI 0, IIE ; 
LDI MASK, RO 

MHO RO, IIF 

OR ENABLE, ST ; 


MAIN PROCESSING SECTION 


XOR ENABLE,ST =; 
POPF R1 ; 
POP R1 ; 
POPF RO : 
POP RO ; 
POP IIF 

POP IIE ; 
POP DP 7 
POP ST : 
RETI ; 


Save status register 
Save data page pointer 
Save interrupt enable register 


Save lower 32 bits and 

upper 32 bits of RO 

Save lower 32 bits and 

upper 32 bits of R1 

Unmask all internal interrupts 


Enable INT2 
Enable all interrupts 


FOR ISR2 


Disable all interrupts 
Restore upper 32 bitsand 
lower 32 bits of Rl 
Restore upper 32 bits and 
lower 32 bits of RO 


Restore interrupt enable register 
Restore data page register 
Restore status register 


Return and enable interrupts 


12-21 


Program Control— Delayed Branches 


12.2.4 Delayed Branches 
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The ’C40 uses delayed branches to create single-cycle branching. The 
delayed branches operate like regular branches but do not flush the pipe- 
line. Instead, the three instructions following a delayed branch are also ex- 
ecuted. Similarly, besides delayed branches, ’C40 also uses link and jump 
(LAJ), link and trap (LAT), delayed repeat block (RPTBD), and delayed re- 
turn from interrupt or trap conditionally (RETIcondD) instructions to avoid 
the pipeline flush (as discussed in Section 6.3 on page 6-9) in the Program 
Flow Control chapter (Chapter 6), the only limitations are that none of the 


three instructions following a delayed branch can be a: 


[} Branch (standard or delayed) 
~ Branch and annul conditionally 
Call to a subroutine 
Link and jump instruction 
Link and trap instruction 
Return from a subroutine 
Return from an interrupt or trap (standard or delayed) 
Repeat instruction (standard or delayed) 
TRAP instruction 
IDLE instruction 


Oocgcoedgcnuvdondnda 


Conditional delayed branches use the conditions that exist at the end of the 
instruction immediately preceding the delayed branch. Sometimes, a 
branch is necessary in the flow of a program, but fewer than three 
instructions can be placed after a delayed branch. For faster execution, it 
is still advantageous to use a delayed branch. This is shown in 
Example 12-9, with a NOP taking the place of the third unused instruction. 
The tradeoff is more instruction words for less execution time. 
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Example 12-9. Delayed Branch Execution 


_ Program Control — Repeat 


a TITLE DELAYED BRANCH EXECUTION 


LDF 


*+AR1 (5) ,R2 


Load contents of memory to R2 


BGED SKIP ; If loaded number >=0, branch (delayed) 
LDFN R2,R1 ; If loaded number <0, load it to Ril 
SUBF 3.0,R1 ; Subtract 3 from Rl 
NOP ; Dummy operation to complete delayed 
* ; branch 
MP YF 1.5,R1 ; Continue here if loaded number <0 
SKIP LDF R1,R3 ; Continue here if loaded number >=0 


12.2.5 Repeat Modes 


The ’C40 supports looping without any overhead. For that purpose, there 
are three instructions: RPTB and RPTBD repeat a block of code, and RPTS 
repeats a single instruction. The three control registers 

CL} RS (Repeat Start address), 

CL} RE (Repeat End address), and 

Li) RC (Repeat Counter) 


contain the parameters that specify loop execution (refer to Section 6.1 on 
page 6-2 for a description of RPTB, RPTBD, and RPTS). Registers RS 
and RE are automatically set from the code, while RC must be set by the 


user, as shown in Example 12-10. 


Example 12-10. Use of Block Repeat to Find a Maximum or a Minimum 


+ + + + 


LDI 
LDI 


LDF 


BLT 


@ADDR, ARO 


*ARO++ (1) ,RO 


LOOP 2 


me We We Ve We We We 


TITLE USE OF BLOCK REPEAT TO FIND A MAXIMUM OR A MINIMUM 


THIS ROUTINE FINDS THE MAXIMUM OR THE MINIMUM OF N=147 NUMBERS 


Initialize repeat counter to 147-1 
ARO points to the beginning 

of the array 

Initialize MAX or MIN to the 

first value 

If it is a negative array, find the 
minimum 
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LOOP1 RPTB MAX 


CMPF *ARO, RO ; Compare number to the maximum 
- MAX LDFLT *ARO, RO ; If greater, this is a new maximum 
B NEXT 
LOOP 2 RPTB MIN | 
CMPF *ARO++(1),RO; Compare number to the minimum 


MIN LDF LT *-ARO(1),RO ; If smaller, this is a new minimum 


12.2.5.1 Block Repeat 


The ’C40 supports both standard and delayed repeat block instructions 
(RPTB and RPTBD). RPTB and RPTBD are the same except that the three 
instructions following RPTBD are not included in the loop (but are included 
in the RPTB loop). For RPTBD, the loop starts at the fourth instruction 
following RPTBD. The restriction of these three following instructions is the 
same as that of the three instructions following a delayed branch. Since 
RPTBD is a single-cycle instruction, it is very useful in making the nesting 
loop program more efficient. Example 12-10 shows the use of the block 
repeat to find the maximum or the minimum value of 147 numbers. The 
elements of the array are either all positive or all negative numbers. Since 
the loop cannot be predetermined, the RPTBD instruction is not suitable 
here. 


12.2.5.2 Specifies Restrictions in the Block-Repeat Construct | 


Because the program counter is modified at the end of the loop according 
to the contents of registers RS, RE, and RC, no operation should attempt 
to modify the repeat counter or the program counter at the end of the loop 
to a different value. 


In principle, it is possible to nest repeat blocks. However, there is only one 
set of control registers: RS, RE, and RC. It is, therefore, necessary to save 
these registers before entering an inside loop and to restore these registers 
after completing the inside loop. It takes four cycles overhead to save and 
restore these registers. Hence, sometimes, it may be more economical to 
implement a nested loop by the more traditional method of using a register 
as a counter, and then using a delayed branch rather than by using the 
nested repeat block approach. 
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Example 12-11. 


+ + + HF F 


Example 12-11 shows an application of the delayed block repeat construct. 
In this example, an array of 64 elements is flipped over by exchanging the 
elements that are equidistant from the end of the array. In other words, if 
the original array is 


a(1), a(2),..., a(31), a(32),..., a(64); 
the final array after the rearrangement will be 
a(64), a(63),..., a(32), a(31),..., a(1). 


Because the exchange operation is done on two elements at the same time, 
it requires 32 operations. The repeat counter (RC) is initialized to 31. In gen- 
eral, if RC contains the number N, the loop will be executed N+1 times. The 
loop is defined by the fourth instruction following the RPTBD instruction and 
the EXCH label. 


Loop Using Delayed Block Repeat 


TITLE LOOP USING DELAYED BLOCK REPEAT 


THIS CODE SEGMENT EXCHANGES THE VALUES OF ARRAY ELEMENTS THAT 
ARE SYMMETRIC AROUND THE MIDDLE OF THE ARRAY. 


ad 


LDI @ADDR, ARO ; ARO points to the beginning of the 
array 

RPTBD EXCH ; Repeat RC+1 times between START and 
7; EXCH 

LDI ARO, AR1 

ADDI 63,AR1 ; AR1 points to the end of the array 

LDI 31,RC ; Initialize repeat counter 

>>>>>>>>>>>>>>> ; Loop starts here 

LDI *ARO, RO ; Load one memory element in RO, 

LDI *AR1,RI1 ; and the other in Rl 

STI R1, *ARO++(1) ; Then, exchange their locations 


STI RO, *AR1-- (1) 
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12.2.5.3 Single-Instruction Repeat 


The single-instruction repeat uses control registers RS, RE, and RC in the 
same way as does the block repeat. The advantage over the block repeat 
is that the instruction is fetched only once, and then the buses are available 
for moving operands. One difference to note is that the single-instruction re- 
peat construct is not interruptible, while block repeat is interruptible. 


Example 12-12 shows an application of the repeat-single construct. In this 
example, the sum of the products of two arrays is computed. The arrays are 
not necessarily different. If the arrays are a(i) and b(i), and if each is of length 
N=512, register RO will contain, after computation, this quantity: 

a(1) b(1) + a(2) b(2) +...4+ a(N) b(N). 
The value of the repeat counter (RC) is specified to be 511 in the instruction. 


Example 12-12. Loop Using Single Repeat 


+ 


TITLE LOOP USING SINGLE REPEAT 


LDI ~ @ADDR1, ARO . ; ARO points to array a(i) 

LDI @ADDR2,AR1 ; AR1 points to array b(i) 
* 

LDF 0.0,R0 ; Initialize RO 
* 

MPYF3 *ARO++(1),*AR1++(1),R1 ; Compute first product 
* 

RPTS 511 ; Repeat 512 times 
* 

MPYF3 *ARO++(1),*ARI++(1),RiL ; Compute next product and 
im ADDF3 R1,RO,RO ; accumulate the 

; previous one 

* 


ADDF R1,RO ; One final addition 
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12.2.6 Computed GOTOs to Select Subroutines at Runtime 


Occasionally, it is convenient to select during runtime, and not during as- 
sembly, the subroutine to be executed. The ’'C40’s computed GOTO sup- 
ports this selection. The computed GOTO is implemented by using the 
CALLcond instruction in the register addressing mode. This instruction 
uses the contents of the register as the address of the call. Example 12-13 
shows the case of a task controller. 


Example 12-13. Computed GOTO 


* TITLE COMPUTED GOTO 
* . 
* TASK CONTROLLER 
* 
* THIS MAIN ROUTINE CONTROLS THE ORDER OF TASK EXECUTION (6 TASKS 
* IN THE PRESENT EXAMPLE). TASKO THROUGH TASK5 ARE THE NAMES OF 
* SUBROUTINES TO BE CALLED. THEY ARE EXECUTED IN ORDER, TASKO, 
* TASK1, . . . TASKS. WHEN AN INTERRUPT OCCURS, THE INTERRUPT 
* SERVICE ROUTINE IS EXECUTED, AND THE PROCESSOR CONTINUES 
* WITH THE INSTRUCTION FOLLOWING THE IDLE INSTRUCTION. THIS 
* ROUTINE SELECTS THE TASK APPROPRIATE FOR THE CURRENT CYCLE, 
* CALLS THE TASK AS A SUBROUTINE, AND BRANCHES BACK TO THE IDLE 
em TO WAIT FOR THE NEXT SAMPLE INTERRUPT WHEN THE SCHEDULED TASK 
cai HAS COMPLETED EXECUTION. RO HOLDS THE OFFSET FROM THE BASE 
* ADDRESS OF THE TASK TO BE EXECUTED. BIT 15 (SET COND BIT) OF 
* STATUS REGISTER. (ST) SHOULD BE SET TO 1. 
* 
x . 

LDI 5, 1RO ; Initialize IRO 

LDI @ADDR, AR1 ; AR1 holds the base address 

; of the table 
WAIT IDLE ; Wait for the next interrupt 
~ ADDI *+AR1 (IRO),R1 ; Add the base address to the table 

* ; entry number 

SUBI 1, IRO ; Decrement IRO 

LDILT 5, 1R0O ; If IRO<O, reinitialize it to 5 

CALLU R1 ; Execute appropriate task 

BR WAIT 
* . 
TSKSEQ -word TASK5 ; Address of TASK5 

-word TASK4 ; Address of TASK4 

.word TASK3 ; Address of TASK3 

-word TASK2 ; Address of TASK2 

.word TASK1 ; Address of TASK1 

-word TASKO ; Address of TASKO 
ADDR .word TSKSEQ 
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12.3 Logical and Arithmetic Operations 


The ’C40 instruction set supports both integer and floating-point arithmetic 
and logical operations. The basic functions of such instructions can be com- 
bined to form more complex operations. This section examines examples 
of these operations: 


Bit manipulation 

Block moves 

Byte and half-word manipulation 
Bit-reversed addressing 

Integer and floating-point division 
Square root 


oooococoa 


Extended-precision arithmetic | | 
[J Floating-point format conversion between IEEE and ’C40 formats 


12.3.1 Bit Manipulation 


Instructions for logical operations, such as AND, OR, NOT, ANDN, and — 
XOR, can be used together with the shift instructions for bit manipulation. 
A special instruction, TSTB, tests bits. TSTB does the same operation as 
AND, but the result of the TSTB is used only to set the condition flags and 
is not written anywhere. Example 12-14 and Example 12-15 demonstrate 
the use of the several instructions for bit manipulation and testing. 


Example 12-14. Use of TSTB for Software-Controlled Interrupt 


+ + + + + F F 
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TITLE USE OF TSTB FOR SOFTWARE-CONTROLLED INTERRUPT 


IN THIS EXAMPLE, ALL INTERRUPTS HAVE BEEN DISABLED BY 
RESETTING THE GIE BIT OF THE STATUS REGISTER. WHEN AN 
INTERRUPT ARRIVES, IT IS STORED IN THE IF REGISTER. THE 
PRESENT EXAMPLE ACTIVATES THE INTERRUPT SERVICE ROUTINE INTR 
WHEN IT DETECTS THAT INT2-— HAS OCCURRED. 


TSTB 4,1IF ; Check if bit 2 of IF is set, 
CALLNZ INTR ; and, if so, call subroutine INTR 
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Example 12-15. Copy a Bit from One Location to Another 


* 
* 
* 
* 
* 
* 


LDI 
LSH 
TSTB 
BZD 
LDI 
LSH 
ANDN 
OR 
CONT 


12.3.2 Block Moves 


*+ARO(1),R0 
RO, R2 
RO, R2 


TITLE COPY A BIT FROM ONE LOCATION TO ANOTHER 


BIT I OF R1l NEEDS TO BE COPIED TO BIT J OF R2. 
ARO POINTS TO A LOCATION HOLDING I, AND IT IS ASSUMED THAT THE 
NEXT MEMORY LOCATION HOLDS THE VALUE J. 


Shift 1 to align it with bit I 
Test the I-th bit of Rl 
If bit = 0, branch delayed 


Align 1 with J-th location 
If bit = 0, reset J-th bit of R2 
= 1, set J-th bit of R2 


Because the ’C40 directly addresses a large amount of memory, blocks of 
data or program code can be stored off-chip in slow memories and then 
loaded on-chip for faster execution. Data can also be moved from on-chip 
to off-chip for storage or for multiprocessor data transfers. 


Such data transfers can be accomplished efficiently in parallel with CPU 
operations using the DMA. The DMA operation is explained in detail in 
Chapter 9. An alternative to DMA is to perform data transfers under program 
control by using load and store instructions in a repeat mode. 
Example 12—16 shows the transfer of a block of 512 floating-point numbers 
from external memory to block 1 of the on-chip RAM. 
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Example 12-16. Block Move Under Program Control 


* 
* 


extern 


blockl 


LDF *AROQ++,RO 
STF RO, *AR1++ 


TITLE BLOCK MOVE UNDER PROGRAM CONTROL 


-word 01000H 
word O2FFCOOH 


LDI @extern, ARO ; Source address 


LDI @block1,AR1 , ; Destination address 
LDF *ARO++,RO ; Load the first number 
RPTS 510 ; Repeat following instruction 


; Load the next number, and... 


; 511 times 
; store the previous one 


STF RO, *AR1 ; Store the last number 


12.3.3 Byte and Half-Word Manipulation 


A new set of instructions for byte and half-word accessibility, such as 
LB(3,2,1,0), LBU(3,2,1,0), LH(1,0), LHU(1,0), LWL(0,1,2,3), LWR(0,1,2,3), 
MB(3,2,1,0), and MH(1,0), are available on the ’C40. In application such as 
image processing, itis often important to be able to manipulate packed data. 


For example, the pixels in color images are often represented by four 8-bit 


unsigned quantities — red, green, blue and alpha — which are packed 
into a single 32-bit word. The byte and half-word instruction will make it very 
easy to manipulate this packed data. 


Example 12—17 shows the case of packing data from a half-word FIFO to 
32-bit data memory, and Example 12—18 shows the case of unpacking a 
32-bit data array into a four-byte-wide data array (assuming the 32-bit data 
array contains four 8-bit unsigned numbers). 


Example 12-17. Use of Packing Data From Half-Word FIFO to 32-Bit Data Memory 


* 
* 
* 
* 
* 
* 


TITLE USE OF PACKING DATA FROM HALF-WORD FIFO 


TO 32-BIT DATA MEMORY 


IN THIS EXAMPLE, EVERY TWO INPUT 16 BITS DATA 
HAS BEEN PACKED INTO ONE 32-BIT DATA MEMORY. THE LOOP SIZE 
USED HERE IS ARRAY SIZE, NOT THE INPUT DATA LENGTH. 
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RPTBD PACK 
LDI @fifo_adr,AR1 ; Load fifo address 
LDI @array, AR2 ; Load data array address 
LDI size-1,RC ; Load array size 
* >>>>>>>>>>>>>>>> ; Loop starts here 
LWLO *AR1,R9 ; Pack 16 LSBs 
LWL1 *AR1,R9 ; Pack 16 MSBs 
; Store the data 


PACK STI R9, *AR2++ (1) 


Example 12-18. Use of Unpacking 32-Bit Data Into Four-Byte-Wide Data Array 


ai TITLE USE OF UNPACKING 32-BIT DATA INTO FOUR BYTE-WIDE 

. DATA ARRAY 

* 

x THIS EXAMPLE ASSUMED THAT THE 32-BIT DATA CONTAINS FOUR 8-BIT 
* 


UNSIGNED DATA. 


LDI @input_ adr, ARO ; Load RPTBD UNPACKinput address 


LDI @arrayl,AR1 ; Load output data array 1 address 
LDI @array2,AR2 ; Load output data array 2 address 
RPTBD UNPACK | . 
LDI @array3,AR3 ; Load output data array 3 address 
LDI @array4, AR4 ; Load output data array 4 address 
LDI size-1,RC ; Load array size 

* >>>>>>>>>>>>>>>> ; Loop starts here 
LBUO *ARO, R8 ; Unpack first byte 
STI R8, *AR1++ (1) . 
LBU1 *ARO,R8 ; Unpack second byte 
STI R8, *AR2++ (1) 
LBU2 *ARO, R8 ; Unpack third byte 
Sri, R8, *AR3++ (1) 
LBU3 *ARO++(1),R8 ; Unpack fourth byte 


UNPACK STI R8, *AR4++ (1) 


12.3.4 Bit-Reversed Addressing 


The ’C40 can implement fast Fourier transforms (FFT) with bit-reversed ad- 
dressing. If the data to be transformed is in the correct order, the final result 
of the FFT is scrambled in bit-reversed order. To recover the frequency-do- 
main data in the correct order, certain memory locations must be swapped. 
The bit-reversed addressing mode makes swapping unnecessary. The next 
time data needs to be accessed, the access is done in a bit-reversed man- 
ner rather than sequentially. In’C40, this bit-reversed addressing can be im- 
plemented through both the CPU and DMA. 


In CPU bit-reversed addressing, IRO holds a value equal to one-halfthe size _ 
of the FFT, if real and imaginary data are stored in separate arrays. During FR) 
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accessing, the auxiliary register is indexed by IRO, but with reverse carry 
propagation. Example 12-19 illustrates a 512-point complex FFT being 
moved from the place of computation (pointed at by ARO) to a location 
pointed at by AR1. In this example, real and imaginary parts XR(i) and XI(i) 
of the data are not stored in separate arrays, but they are interleaved with 
XR(0), X1(0), XR(1), X1(1), ..., XR(N1), XI(N1). Because of this arrangement, 
the length of the array is 2N instead of N, and IRO is set to 512 instead of 
256. | 


Example 12-19. Bit-Reversed Addressing 


TITLE BIT-REVERSED ADDRESSING 


THIS EXAMPLE MOVES THE RESULT OF THE 512-POINT FFT 
COMPUTATION, POINTED AT BY ARO, TO A LOCATION POINTED AT 
BY AR1. REAL AND IMAGINARY POINTS ARE ALTERNATING. 


+ % + 4 % 


LDI 512, IRO 

RPTBD LOOP 

LDI 2, iIR1 

LDI Sl, RC ; Repeat 511+1 times 

LDF *+ARO(1),R1 ; Load first imaginary point 


LDF *ARO++ (IRO)B, RO ; 
| | STF R1, *+AR1 (1) 


; Load real value (and point 
; to next location) and store 
; the imaginary value 

LOOP LDF *+ARO (1),R1 ; Load next imaginary point 
7; and store 


| | STF RO, *AR1++ (IR1) previous real value 


In DMA bit-reversed addressing, there are two bits in the DMA control regis- 
ter to enable bit-reversed addressing on DMA reads and DMA write. The 
source address index register and destination address index register are 
used to define the size of the bit-reversed addressing. Their function is simi- 
lar to the CPU index register IRO. For more detail information about DMA 
operation, refer to Chapter 9. 
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12.3.5 Integer and Floating-Point Division 


’C40 has a single-cycle instruction, RCPF, to generate an estimate of the 
reciprocal of a floating-point number. This estimate has the correct expo- 
nent, and the mantissa is accurate to the eighth binary place (the error of 
the mantissa is < 2-8). Often, this is a satisfactory estimate of the reciprocal 
of a floating-point number. In other cases, this estimate may be used as a 
seed for an algorithm that computes the reciprocal to even greater accuracy. 
The Newton-Raphson algorithm described later is one such case. 


For integer division, although the special instruction is not provided, the in- 
struction set has the capacity to perform an efficient division routine. Be- 
sides, the rough estimate can be achieved through FLOAT, RCPF, and FIX 
instructions. 


12.3.5.1 Integer Division 


Division is implemented on the ’C40 by repeated subtractions using SUBC, 
a special conditional subtract instruction. Consider the case of a 32-bit posi- 
tive dividend with i significant bits (and 32-i sign bits), and a 32-bit positive 
divisor with j significant bits (and 32-j sign bits). The repetition of the SUBC 
command i-j+1 times produces a 32-bit result where the lower i-j+1 bits are 
the quotient, and the upper 31-i+j bits are the remainder of the division. 


SUBC implements binary division in the same manner as long division. The 
divisor (assumed to be smaller than the dividend) is shifted left i-j times to 
be aligned with the dividend. Then, using SUBC, the shifted divisor is sub- 
tracted from the dividend. For each subtract that does not produce a nega- 
tive answer, the dividend is replaced by the difference. It is then shifted to 
the left, and a one is putin the LSB. If the difference is negative, the dividend 
is simply shifted left by one. This operation is repeated i-j+1 times. 


As an example, consider the division of 33 by 5 using both long division and 
the SUBC method. In this case, i=6, j=3, and the SUBC operation Is re- 
peated 6-3+1=4 times. | 
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00000000000000000000000000000101 
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LONG DIVISION: 


SUBC METHOD: 


00000000000000000000000000100001 


Q0000000000000000000000000101000 


Negative difference 


J 


00000000000000000000000000100001 
00000000000000000000000000107000 


00000000000000000000000000011010 


00000000000000000000000000100001 


00000000000000000000000000101000 


00000000000000000000000000011010 


00000000000000000000000000011011 


00000000000000000000000000101000 


Negative difference 
00000000000000000000000000110110 _ 
Quot. 


Remainder 


00000000000000000000000000000110 


Quotient 
00000000000000000000000000100001 
-101 
101 
Sa Remainder 


Dividend 
Divisor (a : ligned) 
(1st SUBC command) 


New Dividend + Quotient 


Divisor 
Difference (>0) (2nd SUBC command) 


sais Dividend + Quotient 
iviso 
Difference (>0) (8rd SUBC command) 


New Dividend + Quotient 
Divisor 


_ (4th SUBC command) 


Final Result 


When the SUBC command is used, both the dividend and the divisor must 
be positive. Example 12-20 shows a realization of the integer division in 
which the sign of the quotient is properly handled. The last instruction before 
returning modifies the condition flag in case subsequent operations ee 


on the sign of the result. 
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Example 12-20. Integer Division 


* 


+ + 


+ + + + $F HHH HHH HH 


* 


DIVI: 


* 
* 
* 


+ + + F 


TITLE INTEGER DIVISION 
SUBROUTINE DIVI 
INPUTS: SIGNED INTEGER DIVIDEND IN RO, 
SIGNED INTEGER DIVISOR IN Rl. 

OUTPUT: RO/R1 into RO. 
REGISTERS USED: RO-R3, IRO, IR1 
OPERATION: 1. NORMALIZE DIVISOR WITH DIVIDEND 

2. REPEAT SUBC 

3. QUOTIENT IS IN LSBs OF RESULT 
CYCLES: 31-62 (DEPENDS ON AMOUNT OF NORMALIZATION) 


.globl DIVI 


SIGN .set R2 
TEMPF -set R3 
TEMP .set IRO 
COUNT .set IRI 


DIVI - SIGNED DIVISION 


DETERMINE SIGN OF RESULT. GET ABSOLUTE VALUE OF OPERANDS. 


XOR RO,R1, SIGN ; Get the sign 

ABSI RO 

ABSI R1 

CMP I RO,R1 Divisor > dividend ? 


BGTD ZERO ; If so, return 0 


NORMALIZE OPERANDS. USE DIFFERENCE IN EXPONENTS AS SHIFT COUNT 
FOR DIVISOR, AND AS REPEAT COUNT FOR ’SUBC’. 


FLOAT RO, TEMPF ;; Normalize dividend 

PUSHF TEMPF ; USH as float 

POP COUNT ; POP as int 

LSH -24, COUNT ; Get dividend exponent 

FLOAT R1, TEMPF ; Normalize divisor 

PUSHF TEMPF ; PUSH as float 

POP TEMP ; POP as int 

LSH -24, TEMP ; Get divisor exponent 

SUBI TEMP , COUNT ; Get difference in exponents 
LSH COUNT, R1 ; 


Align divisor with dividend 
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* DO COUNT+1 SUBTRACT & SHIFTS. 
* 
RPTS COUNT 
SUBC R1, RO 
* 
* MASK OFF THE LOWER COUNT+1 BITS OF RO 
* 
SUBRI 31, COUNT ; Shift count is (32 -— (COUNT+1) ) 
LSH COUNT, RO ; Shift left 
NEGI COUNT 
LSH COUNT, RO ; Shift right to get result 


* 


“ CHECK SIGN AND NEGATE RESULT IF NECESSARY. 
* 


NEGI RO,R1 ; Negate result 


ASH -31, SIGN ; Check sign 
LDINZ R1, RO ; If set, use negative result 
CMP I 0,RO ; Set status from result RETS 


* 
a RETURN ZERO. 
* 


ZERO: 
LDI 0, RO 
RETS 
.end 


If the dividend is less than the divisor and you want fractional division, you 
can perform a division after you determine the desired accuracy of the quo- 
tient in bits. If the desired accuracy is k bits, start by shifting the dividend 
left by k positions. Then apply the algorithm described above, where i should 
now be replaced by i + k. It is assumed that i + k is less than 32. 


12.3.5.2 Computation of Floating-Point Inverse and Division 


This section presents a method of implementing a single-cycle RCPF in- 
struction (reciprocal of a floating-point number) with an algorithm to extend 
the precision of the mantissa of the reciprocal of a floating-point number 
generated by RCPF instruction. The floating-point division can be obtained 
by multiplying the dividend and the reciprocal of the divisor. 


The input to RCPF is assumed to be v — v(man) x 2V(€XP). The output is 
X = x(man) x 2 X(€XP), The value v(man) (or x(man)) is composed of three 
fields: the sign bit v(sign), an implied nonsign bit, and the fraction field 
v(frac). | | 


The algorithm for RCPF uses these four rules: 


1) Ifv>0, then x(exp) = —v(exp) — 1 and x(man) = 2/v(man). 
For the special case where the ten MSBs of v(man) = 
01.00000000b, then x(man)= 2-2-8 = 01.11111111b. In both 
cases, the 23 LSBs of x(frac) = 0. 
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Example 12-21. 


t+ ++ HH FFF HF HH HH HH OF 


2) 


3) 


4) 


If v < 0, then x(exp) = —v(exp) — 1 and x(man) = 2/v(man). 

For the special case of the ten MSBs of v(man) = 10.00000000b, then 
X(man) =—1—2-8 = 10.11111111b. In both cases, the 23 LSBs of x(frac) 
= 0. 


If v= 0 ( v(exp) = —128 ), then x(exp) = 127 and 

X(man) = 0O1.11419111119111911911111111111111b. 

In other words, if v=0, then x becomes the largest positive number 
representable in the extended-precision floating-point format. The 
overflow flag (V) is set to 1. 


If v(exp) = 127, then x(exp) = —128 and x(man) = 
The zero flag (Z) is set to 1. 


The RCPF instruction gives an estimate of the reciprocal of a number. The 
Newton-Raphson algorithm may be used to further extend the precision of 
the mantissa. The algorithm is 


vis 


x[n+1] = x[n](2.0—vx[n]) 
the number for which the reciprocal is desired. x[0] is the seed for the 


algorithm and is given by RCPF. At every iteration of the algorithm, the num- 
ber of bits of accuracy in the mantissa doubles. Using RCPF, accuracy starts 
at eight bits. With one iteration, accuracy increases to16 bits, and with the 
second iteration, accuracy increases to 32 bits in the mantissa. 
Example 12-21 shows the program to implementthis algorithm on the 'C40. 


Inverse of a Floating-Point Number With 32-Bit Mantissa Accuracy 


TITLE INVERSE OF A FLOATING-POINT NUMBER 


WITH 32-BIT MANTISSA ACCURACY 


SUBROUTINE INVF 


THE FLOATING-POINT NUMBER v IS STORED IN RO. AFTER oe 
COMPUTATION IS COMPLETED, 1/v IS STORED IN Rl. 


TYPICAL CALLING SEQUENCE: 


LAJU 
LDF 
NOP 
NOP 


INVF 

v, RO 

<---- can be other non-pipeline-break 
<---- instructions 


ARGUMENT ASSIGNMENTS: 


ARGUMENT| FUNCTION 


| v = NUMBER TO FIND THE RECIPROCAL OF 
| (UPON THE CALL) 
| 1/v (UPON THE RETURN) 
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INVF : 
* 


+ 


REGISTER USED AS INPUT: RO 
REGISTERS MODIFIED: R1, R2 
REGISTER CONTAINING RESULT: R1 


REGISTER USED FOR SUBROUTINE CALL: Ril 


CYCLES: 8 WORDS: 8 


-global INVF 


RCPF RO,R1 ; Get x[0]) = the estimate of 1/v, RO =v 
MPYF3 R1,RO,R2 
SUBRF 2.0,R2 
MPYF R2,R1 ; End of first iteration 
; (16 bits accuracy) 
BUD R11 ; Delayed return to caller 
MPYF3 R1,RO,R2 
SUBRF 2.0,R2 
MPYF R2,R1 ; End of second iteration 


; (32 bits accuracy) 


Rl = 1/v, Return to caller 


.end 


12.3.6 Square Root 
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In many applications, normalization of data values is necessary. Often, the 
normalizing factor is the square root of another quantity. For example, given 
a vector, the unit vector in the same direction as the original vector can be 
found by normalizing the original vector by the length of the vector. This 
involves a division by a square root. The ’C40 provides a single-cycle 
instruction, RSQRF, to generate an estimate of the reciprocal of the square 
root of a positive floating-point number. This estimate has the correct 
exponent, and the mantissa is accurate to the eighth binary place (the error 
of the mantissa is < 2-8). Like the algorithm for RCPF, the algorithm for 
RSQRF uses these three rules: 


1) If v(exp) is even, then x(exp) = —(v(exp)/2) — 1 and 
x(man) = 2/sqrt(v(man)). For the special case where the ten MSBs of 
y(man) = 01.00000000b, then x(man) = 2 — 2 —8 = 01.11111111b. 
In both cases, the 23 LSBs of x(frac) = 0. 
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2) If v(exp) is odd, then x(exp) = —((v(exp) — 1)/2) — 1 and 
x(man) = sqrt(2/v(man)). The 23 LSBs of x(frac) = 0. 


3) Ifv=0(v(exp) =—128 ), then x(exp) = 127 and 
xX(man) = 01.11114111111111114111111111111111b. 
In other words, if v = 0, then x becomes the largest positive number rep- 
resentable in the extended-precision floating-point format. The over- 
flow flag (V) is set to 1. 


Once the RSQRF instruction gives the estimate of the reciprocal of the 
square root, you can use the Newton-Raphson algorithm to further extend 
the precision of the mantissa. The algorithm is 


x[n4-1] = x{n](1.5—(v/2)x[n}x{n]) 


v is the number for which the reciprocal is desired. x[0] is the seed for the 
algorithm and is given by RSQRF. At every iteration of the algorithm, the 
number of bits of accuracy in the mantissa doubles. Using RSQRF, accura- 
cy starts at eight bits. With one iteration, accuracy increases to16 bits, and 
with the second iteration, accuracy increases to 32 bits in the mantissa. 
Example 12—22 shows the program to implementthis algorithm on the ’C40. 


Example 12-22. Reciprocal of the Square Root of a Positive Floating-Point 


t+ ee He +e HH HK HH HF HK HK KK 


+ Fe OO OF 


TITLE RECIPROCAL OF THE SQUARE ROOT OF A POSITIVE 
FLOATING-POINT 


SUBROUTINE RCPSORF 


THE FLOATING-POINT NUMBER v IS STORED IN RO. AFTER THE 
COMPUTATION IS COMPLETED, 1/SQRT(v) IS STORED IN Rl. 


TYPICAL CALLING SEQUENCE: 
LDF v, RO 
LAJU RCPSORF 


ARGUMENT ASSIGNMENTS: 


ARGUMENT | FUNCTION 
ee Gem a GED END CO GED auED eESD ee eee 
RO | v = NUMBER TO FIND THE RECIPROCAL OF 
| (UPON THE CALL) 
Rl | 1/sqrt(v) (UPON THE RETURN) 


REGISTER USED AS INPUT: RO 
REGISTERS MODIFIED: Rl, R2 
REGISTER CONTAINING RESULT: R11 
REGISTER USED FOR SUBROUTINE CALL: R11 
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* CYCLES: 11 WORDS: 11 
.global RCPSORF 


RCPSORF: RSORF RO,R1 ; Get x[0] = the estimate of 


; l1/sqrt(v), RO =v 
MPYF 0.5,R0 ; RO = v/2 
* 
MPYF3 R1,R1,R2 ; First iteration 
MPYF RO, R2 
SUBRF 1.5,R2 
MP YF R2,R1 ; End of first iteration 
; (16 bits accuracy) 
* 
MPYF3 R1,R1,R2 ; Second iteration 
* 
BRD R11 ; Delayed return to caller 
* 
MPYF RO, R2 
SUBRF L.55R2Z 
MP YF R2,R1 ; End of second iteration 


; (32 bits accuracy) 


+ 


Rl = 1/SQRT(v), Return to caller 


end 


Of course, the square root is found by a simple multiplication: sqrt(v) = vx[n] 
where x[n] is the estimate of 1/sqrt(v) as determined by the Newton-Raph- 
son algorithm or some other algorithms. 
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12.3.7 Extended-Precision Arithmetic 


The TMS320C40 offers 32 bits of precision for integer arithmetic, and 24 bits 
of precision inthe mantissa for floating-point arithmetic. For higher precision 
in floating-point operations, the twelve extended-precision registers RO to 
R11 contain eight more bits of accuracy. Since no comparable extension is 
available for fixed-point arithmetic, this section discusses how fixed-point 
double precision can be achieved by using the capabilities of the processor. 
The technique consists of performing the arithmetic by parts and is similar 
to the way in which longhand arithmetic is done. | 


In the instruction set, operations ADDC (add with carry) and SUBB (subtract 
with borrow) use the status carry bit for extended-precision arithmetic. The 
carry bit is affected by the arithmetic operations of the ALU and by the rotate 
and shift instructions. It can also be manipulated directly by setting the sta- 
tus register to certain values. For proper operation, the overflow mode bit 
should be reset (OVM = 0) so that the accumulator results will not be loaded 
with the saturation values. Example 12—23 and Example 12-24 show 64-bit 
addition and 64-bit subtraction. The first operand is stored in the registers 
RO (low word) and R11 (high word). The second operand is stored in R2 and 
R3, respectively. The result is stored in RO and R1. 


Example 12-23. 64-Bit Addition 


+ + FF HH HF HF FH OH F 


TITLE 64-BIT ADDITION 
TWO 64-BIT NUMBERS ARE ADDED TO EACH OTHER PRODUCING 
A 64-BIT RESULT. THE NUMBERS X (R1,RO) AND Y (R3,R2) 


ADDED, RESULTING IN W (R1,RO0). 


R1 RO 

+ R3 R2 

R1 RO 
ADDI R2, RO 
ADDC R3,R1 
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Example 12-24. 64-Bit Subtraction 


+ + + H+ + HF HF FH HH 


TITLE 64-BIT SUBTRACTION 


TWO 64-BIT NUMBERS ARE SUBTRACTED FROM EACH OTHER 
PRODUCING A 64-BIT RESULT. THE NUMBERS X (R1,RO0) AND 
Y (R3,R2) ARE SUBTRACTED, RESULTING IN W (R1,R0). 


Rl RO 
- R3 R2 

Rl RO 
SUBI R2,R0 
SUBB R3,R1 


When two 32-bit numbers are multiplied, a 64-bit product results. To do this, 
‘C40 provides a 32 x 32-bit multiplier and two special instructions, MPYSHI 
(multiply signed integer and produce 32 MSBs) and MPYUHI (multiply un- 
signed integer and produce 32 MSBs). Example 12-25 shows the imple- 
mentation of a 32-bit by 32-bit multiplication. 


Example 12-25. 32-Bit by 32-Bit Multiplication 


+ + + + + + HF HF HF HF HF 


TITLE 32 x 32-BIT MULTIPLICATION 


TWO 32-BIT NUMBERS ARE MULTIPLIED, PRODUCING A 64-BIT RESULT. 
THE TWO NUMBERS X (RO) AND Y (R1) ARE MULTIPLIED, RESULTING 
IN W (R3,R2). 


RO 
x R1 
R3 R2 
MPYI3 RO,R1,R2 


MPYSHI3 RO,R1,R3 


12.3.8 Floating-Point Format Conversion: IEEE to/from TMS320C40 
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In fixed-point arithmetic, the binary point that separates the integer from the 
fractional part of the number is fixed at a certain location. For example, if a 
32-bit number has the binary point after the most significant bit (which is also 
the sign bit), only fractional numbers (numbers with absolute values less 
than 1), can be represented. In other words, there is a number with 31 frac- 
tional bits called aQ31. All operations assume that the binary point is fixed 
at this location. The fixed-point system, although simple to implement in 
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eB aes ae cacanasanae 


hardware, imposes limitations in the dynamic range of the represented 
number. This causes scaling problems in many applications. You can avoid 
this difficulty by using floating-point numbers. 


A floating-point number consists of amantissam multiplied by base b raised 
to an exponent e: 


m* be 


In current hardware implementations, the mantissa is typically a normalized 
number with an absolute value between 1 and 2, and the base is b = 2. Al- 
though the mantissa is represented as a fixed-point number, the actual val- 
ue of the overall number floats the binary point because of the multiplication 
by b®. The exponent eis an integer whose value determines the position of 
the binary point in the number. IEEE has established a standard format for 
the representation of floating-point numbers. 


To achieve higher efficiency in the hardware implementation, the ’C40 uses 
a floating-point format that differs from the IEEE standard. However, 'C40 
has two single-cycle instructions, TOIEEE and FRIEEE, for the format con- 
version. These two instructions can also be used with the STF instruction, 
_ which allows the data format to be converted within memory to memory 
transfer. This subsection describes briefly the two formats and presents an 
example program to convert between them. 


TMS320C40 floating-point format: 
8 bits 1 23 bits 


pee Jf 


In a 32-bit word representing a floating-point number, the first 8 bits corre- 
spond to the exponent expressed in twos-complement format. One bit is for 
sign, and 23 bits are for the mantissa. The mantissa is expressed in twos- 
complement form with the binary point after the most significant nonsign bit. 
Since this bit.is the complement of the sign bit s, it is suppressed. In other 
words, the mantissa actually has 24 bits. One special case occurs when 
e = —128. In this case, the number is interpreted as zero, independently of 
the values of s and f (which are by default set to zero). To summarize, the 
values of the represented numbers in the ’C40 floating-point format are as 
follows: 


2°* (01.f) ifs=0 
2°* (10.f) ifs=1 
0 ife=-128 
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IEEE floating-point format: 
fq 8 bits 23 bits 


Ee ee ee 


The IEEE floating-point format uses sign-magnitude notation for the mantis- 
sa, and offset by 127 for the exponent. In a 32-bit word representing a floa- 
ting-point number, the first bit is the sign bit. The next 8 bits correspond to 
the exponent, expressed in an offset-by-127 format (the actual exponent is 
e-127). The following 23 bits represent the absolute value of the mantissa 
with the most significant 1 implied. The binary point is after this most signifi- 
cant 1. In other words, the mantissa actually has 24 bits. There are several 
special cases, summarized below. 


These are values of the represented numbers in the IEEE floating-point for- 
mat: 


(—1)$* 2e-127 * (01,f) if0 <e< 255 
Special cases: | | 
(-1)$* 0.0 if e =0 and f =0 (zero) 
(—1)S* 2-126 * (0, f) if e = 0 and f <> 0 (denormalized) 
(—1)5* infinity if e = 255 and f = 0 (infinity) 
NaN (not a number) if e = 255 andf <> 0 


Based on these definitions of the formats, ’C40 has developed the hardware 
to do the conversion. It assumes that the source data for the IEEE format 
is in memory only and that for the ’C-40 floating-point format, the source data 
is in either memory or an extended-precision register. The destination for 
both conversions must be in an extended-precision register. In the case of 
block memory transfer, the no penalty data format conversion can be 
achieved by parallel instruction with STF. Example 12-26 and 
Example 12-27 show the data format conversion within the data transfor- 
mation between communication port and internal RAM. 
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Example 12-26. IEEE to TMS320C40 Conversion Within Block Memory Transfer 


TITLE IEEE TO TMS320C40 CONVERSION WITHIN BLOCK MEMORY 
TRANSFER 


+ + + + + EH HF 


LDI 
LDI 
FRIEEE 


RPTS 
FRIEEE 
|| STF 


STF 


@CPO_IN,ARO 


@RAMO,AR1 
*ARO, RO 


6 
*ARO, RO 
RO, *AR1++ (1) 


RO, *AR1++ (1) 


me Ve Ye Veo 


PROGRAM ASSUMES THAT THE INPUT FIFO OF COMMUNICATION PORT 0 
ARE FULL OF IEEE FORMAT DATA. EIGHT DATA ARE TRANSFERRED FROM 
COMMUNICATION PORT O TO INTERNAL RAM BLOCK 0 AND THE DATA | 
FORMAT ARE CONVERTED FROM IEEE FORMAT TO TMS320C40 FLOATING- 
POINT FORMAT. 


Load comm. port 0 input Fifo 
address 

Load internal RAM block 0 address 
Convert first data 


Convert next data 
Store previous data 


Store last data 


Example 12-27. TMS320C40 to IEEE Conversion Within Block Memory Transfer 


t+ + + +  F 


LDI 
TOIEEE 


RPTS 
TDIEEE 
[ | STF 


STFRO, 


@CPO OUT, ARO 
@RAMO, AR1 
*AR1++(1),RO 


6 
*AR1++(1),R0 
RO, *ARO 


*ARO 


ve 


TITLE TMS320C40 TO IEEE CONVERSION WITHIN BLOCK MEMORY 
TRANSFER 


PROGRAM ASSUMES THAT THE OUTPUT FIFO OF COMMUNICATION PORT 0 
IS EMPTY. EIGHT DATA ARE TRANSFERRED FROM INTERNAL RAM BLOCK 0 
TO COMMUNICATION PORT 0 AND THE DATA FORMAT ARE CONVERTED FROM 
TMS320C40 FLOATING-POINT FORMAT TO IEEE FORMAT. 


Load comm. port 0 output Fifo address 
Load internal RAM block O address 
Convert first data 


Convert next data 


Store previous data 


Store last data 
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12.4 Application-Oriented Operations 


Certain features of the ’C40 architecture and instruction set facilitate the so- 
lution of numerically intensive problems. This section presents examples 
of applications that use these features, such as companding, filtering, matrix 
arithmetic, and fast Fourier transforms (FFT). 


12.4.1 Companding 
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In the area of telecommunications, one of the primary concerns is to 
conserve the channel bandwidth and, at the same time, to preserve high 
speech quality. This is achieved by quantizing the speech samples 
logarithmically. It has been demonstrated that an 8-bit logarithmic quantizer 
produces speech quality equivalent to a 13-bit uniform quantizer. The 
logarithmic quantization _—_is achieved by companding 
(COMpress/exPANDing). Two international standards have been 
established for companding: the y-law (used in the United States and 
Japan), and the A-law (used in Europe). Detailed descriptions of y-law and 
A-law companding are presented in an application report on companding 
routines included in the book Digital Signal Processing Applications with the 
TMS320 Family (literature number SPRA012A). 


During transmission, logarithmically compressed data in sign-magnitude 
form are transmitted along the communications channel. If any processing 
is necessary, these data should be expanded to a 14-bit (for u-law) or 13-bit 


(for A-law) linear format. This operation occurs when data is received at the 


digital signal processor. After processing, and in order to continue 
transmission, the result is compressed back to 8-bit format and transmitted 
through the channel. 


Example 12-28 and Example 12-29 show iigw. compression and 
expansion (i.e., linear to p-law and p-law to linear conversion), while 
Example 12-30 and Example 12-31 show A-law compression and 
expansion. For expansion, using a look-up table is an alternative approach. 
It trades memory space for speed of execution. Since the compressed data 
is 8 bits long, a table with 256 entries can be constructed to contain the 
expanded data. If the compressed data is stored in the register ARO, the 
following two instructions will put the expanded data in register RO: 
ADDI @TABL,ARO ; @TABL = BASE ADDRESS OF TABLE 
LDI *ARO, RO ; PUT EXPANDED NUMBER IN RO 
The same look-up table approach could be used for compression, but the 
required table length would then be 16,384 words for p-law or 8,192 words 
for A-law. If this memory size is not acceptable, the subroutines presented 
in Example 12-28 or Example 12-30 should be used. 
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Example 12-28. \1-Law Compression 


+ + H+ FF HH HH HH HF HH HEH HHH HK HK EK 


MUCMPR 


TITLE UM-LAW COMPRESSION 


SUBROUTINE MUCMPR 


TYPICAL CALLING SEQUENCE: 


LAJU MUCMPR 
LDI v, RO 
NOP <---- can be other non-pipeline-break 
NOP <---- instructions 
ARGUMENT ASSIGNMENTS: 

ARGUMENT| FUNCTION 

ae ee oe oe a a ow ee fn te a a a a a re ee ee 

RO | v = NUMBER TO BE CONVERTED 


REGISTERS USED AS INPUT: RO 
REGISTERS MODIFIED: RO, Rl 
REGISTER CONTAINING RESULT: RO 


CYCLES: 15 WORDS: 15 

-global MUCMPR 

LSH3 -6,R0,R1 : Save sign of number 

ABSI RO, RO 

CMP I 1FDEH, RO ; If RO>Ox1FDE, 

LDIGT 1FDEH, RO ; Saturate the result 

ADDI 33,R0 ; Add bias 

FLOAT RO ; Normalize: (segt+5) OWXYZx...x 
MPYF 0.03125,R0 ; Adjust segment number by 2** (-5) 
LSH 1,R0 ; (seg)WXYZx...x 

PUSHF RO : 

POP RO ; Treat number as integer 

LSH -20,R0 ; Right-justify 

BUD Rll ; Delayed return 

AND O80H,R1 ; Set sign bit 

ADDI R1, RO ; RO = compressed number 

NOT RO ; Reverse all bits for transmission 
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Example 12-29. _1-Law Expansion 
* 
*TITLE ‘{-LAW EXPANSION’ 


SUBROUTINE MUXPND 


TYPICAL CALLING SEQUENCE: 


LAJU MUXPND 

LDI v, RO 

NOP <---- can be other non-pipeline-break 
NOP <---- instructions 


ARGUMENT ASSIGNMENTS: 


ARGUMENT | FUNCTION 
om ae en coe ee a woe os wm fo eae ee em cam ee cs ee se Sem ts ne ca rm en ee we ne ee et oe ete es ee en en em sd 


RO = NUMBER TO BE CONVERTED 
REGISTERS USED AS INPUT: RO 
REGISTERS MODIFIED: RO, Rl, R2 
REGISTER CONTAINING RESULT: RO 


CYCLES: 14 (WORST CASE) WORDS: 14 


+ + FH eH HHH HH eH HH HF HK HK 


.global MUXPND 


MUXPND NOT RO, RO ; Complement bits 
AND3 OFH,RO,R1 ; Isolate quantization bin 
LSH i,Ri 
ADDI 33,R1 ; Add bias to introduce 1xxxxl 
LSH3 -4,R0 ; Isolate segment code 
TSTB 08H, RO ; Test sign 
BZD R11 ; if positive, delayed return 
AND 7,R0 | 
LSH3 RO,R1,RO ; Shift and put result in RO 
SUBI 33,R0 ; Subtract bias 
BUD Rll ; Delayed return 
NEGI RO ; Negate if a negative number 
NOP 
NOP 
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Example 12-30. A-Law Compression 


TITLE A-LAW COMPRESSION 
SUBROUTINE ACMPR 


TYPICAL CALLING SEQUENCE: 

LAJ ACMPR 

LDIv, RO 

NOP <---- can be other non-pipeline-break 
NOP <---- instructions 


ARGUMENT ASSIGNMENTS: 
ARGUMENT | FUNCTION 
sr ete ee Pe ete as ee a a ee 
RO | | v = NUMBER TO BE CONVERTED 


REGISTERS USED AS INPUT: RO 


REGISTERS MODIFIED: RO, R1 
REGISTER CONTAINING RESULT: RO 


CYCLES:17 WORDS: 17 


+ + + + + HF HF HF HF FH HH FE HF OF HE HH HF KF KF OK 


~-global ACMPR 


ACMPR LSH3 -5,R0,R1 ; Save sign of number 
| ABSI RO, RO 
CMP I 1FH, RO : If RO<Ox20, 
BLED END ; Do linear coding 
CMP I OFFFH, RO : If RO>OxFFF, 
LDIGT OFFFH, RO ; saturate the result 
LSH -1,R0 ; Eliminate rightmost bit 


FLOAT RO ; Normalize: (seg+3) OWXYZx...x 
MPYF 0.125,R0 ; Adjust segment number by 2** (-3) 
LSH 1,R0 : (seg) WXYZx...xX 
PUSH FRO 
POP RO ; Treat number as integer 
LSH -20,RO ; Right-justify 
END BUD Ril ; Delayed return 
AND 080H,R1 ; Set sign bit 
ADDI R1,R0 ; RO = compressed number 
XOR OD5H, RO ; Invert even bits for 


transmission 
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Example 12-31. | A-Law Expansion 


TITLE A-LAW EXPANSION 
SUBROUTINE AXPND 


TYPICAL CALLING SEQUENCE: 


LAJU AXPND 

LDI v, RO 

NOP <---- can be other non-pipeline-break 
NOP <---- instructions 


ARGUMENT ASSIGNMENTS: 


ARGUMENT| FUNCTION 
ome eee ene ee eee ee one oun eee fe a ce ee ee care ee ee te oe ce ee Ome eS SD SN SY Se Om SY mee oe 
RO | v = NUMBER TO BE CONVERTED 


REGISTERS USED AS INPUT: RO 
REGISTERS MODIFIED: RO, Rl, R2 
REGISTER CONTAINING RESULT: RO 


CYCLES: 19 (WORST CASE) WORDS: 16 


+ + + HF HHH EH HF HH HHH HH HH HF HK FF 


-global AXPND 


AXPND XOR OD5H, RO, R2 : Invert even bits 


ASH3 -4,R2,R0 ; Store for bit sign 
AND 7,R0 ; Isolate segment code 
BZD SKIP1 
AND3 OFH,R2,R1 : Isolate quantization bin 
LSH 1,R1 
ADDI 1,Rl1 ; Create Oxxxxl 
ADDI 32,R1 : Or 1xxxxl 
SUBI 1,R0 
SKIP1 LSH3 RO,R1, RO ; Shift and put result in RO 
TSTB 80H, R2 ; Test sign bit 
BZAT R11 : If positive, delayed return and 
; annul next three instructions 
NEGI RO ; Negate if a negative number 
NOP 
NOP . 
BU Ril 7 ; Return 
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12.4.2 FIR, IIR, and Adaptive Filters 


Digital filters are a common requirement for digital signal processing sys- 
tems. There are two types of digital filters: finite impulse response (FIR) and 
infinite impulse response (IIR). Each of these types.can have either fixed or 
adaptable coefficients. In this section, the fixed-coefficient filters are pres- 
ented first, and then the adaptive filters are discussed. . 


12.4.2.1 FIR Filters 


If the FIR filter has an impulse response h[0], h[1],..., A[N—1], and x[n] repre- 
sents the input of the filter at time n, the output y[n] at time n is given by this 
equation: 


y{n] = h{O] x[n] + h[1] x{n—1] +... + h[N-1] x[n—-(N-1)] 


Two features of the 'C40 that facilitate the implementation of the FIR filters 
are parallel multiply/add operations and circular addressing. The first one 
permits the performance of a multiplication and an addition ina single ma- 
chine cycle, while the second one makes a finite buffer of{ength N sufficient 
for the data x. 


Figure 12—1 shows the arrangement of the memory locations in order 
to implement circular addressing, while Example 12-32 presents the 
‘C40 assembly code for an FIR filter. 


Figure 12-1. Data Memory Organization for an FIR Filter 


impulse initial final 
ow response input samples input samples | 
address h(N —1) oldest input | xin —(N-1)] 
x{n - (N -2)] x{n — (N-1)] 
@ @ © 
® @ ®@ circular 
queue 
@ @ @ _ 
high newest input 


address 
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In order to set up circular addressing, initialize the block-size register BK to 
block length N. Also, the locations for signal x should start from a memory 
location whose address is a multiple of the smallest power of 2 that is greater 
than N. For instance, if N = 24, the first address for x should be a multiple 
of 32 (the lower 5 bits of the beginning address should be zero). To under- 
stand this requirement, look at Section 5.3 on page 5-25, Circular Address- 
ing. 


In Example 12-32, the pointer to the input sequence x is incremented and 
assumed to be moving from an older input to a newer input. At the end of 
the subroutine, AR1 will point to the position for the next input sample. 


Example 12-32. FIR Filter 
* 


+e +e eH + HHH HHH HH HEH FH HHH HF HF HK FH HK FF 
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TITLE FIR FILTER 


SUBROUTINE FIR 


EQUATION: y(n) = h(O) * x(n) + A(1) * x(n-1) + 
~o. + H(N-1) * x(n-(N-1)) 


TYPICAL CALLING SEQUENCE: 


LOAD ARO 
LAJU FIR 
LOAD AR1 
LOAD RC 
LOAD BK 


ARGUMENT ASSIGNMENTS: 


ARGUMENT| FUNCTION 


+ 
ARO | ADDRESS OF h(N-1) 
| 
| 
| 


AR1 ADDRESS OF x (N-1) 
RC LENGTH OF FILTER —- 2 (N-2) 
BK LENGTH OF FILTER (N) 


REGISTERS USED AS INPUT: ARO, AR1, RC, BK 
REGISTERS MODIFIED: RO, R2, ARO, AR1, RC 
REGISTER CONTAINING RESULT: RO 
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* A + F 


CONV 


* 


CYCLES: 7 + N WORDS: 9 


~-Global FIR 


RPTBD CONV 

MPYF3 *ARO++(1),*AR1++(1)%,R0 
LDF 0.0,R2 

NOP 


FILTER (l <= i < N) 


MPYF3 *ARO++ (1), *AR1++ (1) %,RO 
ADDF3 RO,R2,R2 


BUD R11 

ADDF RO,R2, RO 
NOP 

NOP 

end 


.end 


12.4.2.2 IIR Filters 


The transfer function of the IIR filters has both poles and zeros. Its output 
depends on both the input and the past output. As arule, the filters need less 
computation than an FIR with similar frequency response, but the filters 
have the drawback of being sensitive to coefficient quantization. Most often, 
the IIR filters are implemented as a cascade of second-order sections, 
called biquads. Example 12-33 and Example 12-34 show the implementa- 
tion for one biquad and for any number of biquads, respectively. 


y[n] = a1 y[n—1] + a2 y[n—2] + bO x[n] + b1 x[n—1] + b2 x[n-2] 


e 
4 


~ 


Setup the repeat cycle. 
Initialize RO: 
h(N-1) *x(n-(N-1)) ->RO 
Initialize R2. 


h (N-1-i) *x (n- (N-1-1) ) ->RO 
Multiply and add operation 


; Delayed return 
; Add last product 
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Figure 12-2. 


Example 12-33. 


++ &€ + + HF HF HF F 


+ $+ + + * FF HF OF F 
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However, the following two equations are more convenient and have small- 
er storage requirements: 


d[n] = a2 d[n—2] + a1 d[n—1] + x[n] 
y[n] = b2 d[n—2] + b1 dfn—1] + b0 dfn] 


Figure 12-2 shows the memory organization for this two-equation ap- 
proach, an implementation of a single biquad on the ’C40. : 


Data Memory Organization for a Single Biquad 


filter newest delay newest delay 
oui coefficients node values node values 


address|_— a2 newest delay d(n -1) 


| be d(n-2) | circular queue 
oldest delay |_d(n- 2) 
high{_ bo 
address 


As in the case of FIR filters, the address for the start of the values d must 
be a multiple of 4; i.e., the last two bits of the beginning address must be 
zero. The block-size register BK must be initialized to 3. 


IIR Filter (One Biquad) 


TITLE IIR filter 

SUBROUTINE IIR1 

IIR] == IIR FILTER (ONE BIQUAD) 

UATIONS: d(n) = a2 * d(n-2) + al * d(n-1) + x(n) 

y(n) = b2 * d(n-2) + bl * d(n-1) + bO * d(n) 

OR y(n) = al*y(n-1) + a2*y(n-2) + bO*x(n) + b1*x(n-1) 
+ b2*x(n-2) 

TYPICAL CALLING SEQUENCE: 

load R2 

LAJU IIR1 

load ARO 

load ARI1 

lo 


ad BK 
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ADDRESS OF DELAY MODE VALUES (D (N- 2)) 


R2, ARO, AR1, BK 
R2, ARO, AR1 


a2 * d(n-2) -> RO 
b2 * d(n-2) -> R1 


al * d(n-1) -> RO 
a2*d(n-2)+x(n) -> R2 


b1 * d(n-1). -—>- RO 
al*d(n-1) +a2*d (n-2) 
+x(n) -> R2 
Delayed return 


bO * d(n) -> R2 
Store d(n) and point to 
d(n-1) 


bi*d(n-1)+b0*d(n) -> R2 
b2*d (n-2) +b1*d(n-1) 


+bO*d(n) -> RO 


* 
* ARGUMENT ASSIGNMENTS: 
* pRSUMeENT | FUNCTION 
Fa a ee sa pe a a a a a 
* R2 | INPUT SAMPLE X(N) 
* ARO | ADDRESS OF FILTER COEFFICIENTS (A2) 
* AR1 | 
* BK | BK = 3 
* 
* REGISTERS USED AS INPUT: 
* REGISTERS MODIFIED: RO, Rl, 
* REGISTER CONTAINING RESULT: RO 
* 
* CYCLES: 8WORDS: 8 
* 
* 
-Global IIR1 
* 
IIR1 MPYF3 *ARO, *AR1, RO ; 
MPYF3 *++ARO(1),*AR1—-- (1)%,R1 
* 
MPYF3 *++ARO (1), *AR1, RO 
| | ADDF3 RO,R2,R2 
* 
MPYF3 *++ARO (1), *AR1—--—(1)%,R0O 
| | ADDF3 RO,R2,R2 
* 
BUD Ril 
* 
MPYF3 *++ARO(1),R2,R2 
| | STF R2, *AR1++ (1) % 
* 
ADDF RO, R2 
ADDF R1,R2,R0 
* 
* end 
* 
.end 


In the more general case, the IIR filter contains N>1 biquads. The equations 
for its implementation are given by the following pseudo-C language code: 


y[0,n] = 


x[n] 


for (i=0; i<N; i++){ 
d[i,n] = a2fi] d[i,n—2] + a1[i] afi,n—1] + y[i-1,n] 
yfi,n] = b2{i] dfi-2] + b1[i] dfi,n—1] + bOfi] dfi,n] 


y{n] = y[N-1,n] 


Figure 12-3 shows the corresponding memory organization, while 
Example 12-34 shows the ’C40 assembly-language code. 
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Figure 12-3. Data Memory Organization for N Biquads 


Example 12-34. 


+ $+ + + + + HF FHF HF HF HF FF KF OF 
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low 
address 


high 
address 


filter 
coefficients 


newest delay 


oldest delay 


initial delay 
node values 


final delay 
node values 


circular queue 


d(N—1, n—1) 


circular queue 


The block size register BK should be initialized to 3, and the beginning of 
each set of d values (i.e., d[i,n], i = 0...N-1) should be at an address that 
is a multiple of 4 (the last two bits zero), as stated in the case of a single bi- 


quad. 
IIR Filters (N > 1 Biquads) 


TITLE IIR FILTERS (N > BIQUADS) 


SUBROUTINE IIR2 


EQUATIONS : y(O,n) = x(n) 


FOR (i 
{ 
d(i,n) 
y (i,n) 
} 


O; i < Ns itt) 


a2(i) * d(i,n-2) + al(i) * d(i,n-1) * y(i-1,n) | 
b2(i) * d(i,n-2) + b1l(i) * d(i,n-1) * bO(i) * d(i,n) 


y(n) = y(N-1,n) 


TYPICAL CALLING SEQUENCE: 


load 
load 
load 
load 


R2- 
ARO 
AR1 
IRO 
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* LAJU ITR2 

ms load IR1 

* load BK 

* load RC 

* 

* 

* ARGUMENT ASSIGNMENT: 

* ARGUMENT | FUNCTION 

* oe ee a ee ee om cf ne a a 
R2 | INPUT SAMPLE x(n) 

n ARO {| ADDRESS OF FILTER COEFFICIENTS (a2 (0) ) 
x AR1 | ADDRESS OF DELAY NODE VALUES (d(0,n-2) ) 
x BK | BK = 3 

x IRO | IRO = 4 

ms IR1 | IR1 = 4*N-4 

sal RC | NUMBER OF BIQUADS (N) -2 

2 | 

x REGISTERS USED AS INPUT; R2, ARO, AR1, IRO, IR1, BK, RC 
* REGISTERS MODIFIED; RO, Rl, R2, ARO, AR1, RC 

= REGISTERS CONTAINING RESULT: RO 

* 

* CYCLES: 4 + 6N WORDS: 15 

* 

* 


-global JIIR2. 


IITR2 MPYF3 *ARO, *AR1, RO | ; a2(0) * d(0,n-2) -> RO 

MPYF3 *AR1++ (1), *ARI—(1)%,R1 ; b2(0) * d(0,n-2) -> R1 
* 

RPTBD LOOP ; Set loop for 1<=i‘<n 
* 

MPYF3 *++ARO (1), *AR1, RO ; al(0O) * D(O,n-1) -> RO 
| | ADDF RO,R2,R2 ; First sum term 


; Of d(0,n). 


MPYF3 *++ARO (1), *AR1—(1)%, RO ; b1(0).* d(0,n-1) -> RO 
| | ADDF3 RO,R2,R2 ; Second sum term 
; of d(0,n). 
MPYF3 *++ARO(1),R2,R2 ; bO(O) * d(0,n) -> R2 
| | STF R2, *AR1-—(1)% ; Store d(0,n) ; Point to 
; a(0,n-2) 
* 
x LOOP STARTS HERE 


MPYF3 *++ARO (1) ,*++AR1 (IRO) , RO 
I ADDF3 _ R0,R2,R2 


a2(i) * d(i,n-2) -> RO 
First sum term 

of y(i-1l, n). 

Pipeline hit on previous 
instruction 


we Ve Vo Se Veo 


MPYF3 *++ARO (1), *AR1—(1)%,R1 
[| ADDF3 R1,R2,R2 


b2(1i) * D(i,n-2) -> R1 
Second sum term 

of y(i-l,n). 

al(i) * d(i,n-1) -> RO 


~—we Ze We We 


MPYF3 *++ARO0 (1), *AR1, RO 


12 


12-57 


Applications-Oriented Operations - — FIR, HR, Adaptive Filters 


| | ADDF3 RO, R2,R2 ; First sum term 
“OL: d Gagn)s 
* : 
MPYF3 *++ARO (1), *AR1—(1) %,RO ,; bi(i) * d(i,n-1) -> RO 
| |  ADDF3 RO,R2,R2 ; Secondsumterm 
: ; of d(i,n). 
* 
LOOP MPYF3 *++AR0 (1) ,R2,R2 ; bO(i) * d(i,n) -> R2 
| | STF R2, *AR1—(1)% ; Store d(i,n) 
; point to d(i,n-2) 
* 
* FINAL SUMMATION 
* 
BRD R11 ; Delayed return 
* 
ADDF RO, R2 ; First sum term 
; ofy (n-1,n) 
ADDF3 R1,R2,R0 ; Second sum term of 
; “yineiyn 
LDI *AR1--(IR1),R1 ; Return to first biquad 
Ti LDI *AR1-— (1) %,R2 ; Point to d(0,n-1) 
: | 
~ end 
* 
.end 


12.4.2.3 Adaptive Filters (LMS Algorithm) 


In some applications in digital signal processing, a filter must be adapted 
over time to keep track of changing conditions. The book Theory and Design 
of Adaptive Filters by Treichler, Johnson, and Larimore (Wiley-interscience, 
1987) presents the theory of adaptive filters. Although in theory, both FIR 
and IIR structures can be used as adaptive filters, the stability problems and 
the local optimum points that the IIR filters exhibit make them less attractive 
for such an application. Hence, until further research makes IIR filters a bet- 
ter choice, only the FIR filters are used in adaptive algorithms of practical _ 
applications. 


In an adaptive FIR filter, the filtering equation takes this form: 
y[n] = h[n,O] x[n] + h{n, 1]xf[n—1] +...+ h[n, N—1]x[n-(N-1)] 


The filter coefficients are time-dependent. Ina least-mean-squares (LMS) 
algorithm, the coefficients are updated by an equation in this form: 


h[n+1,i] = h[n, 1] + b x[n-i], i= 0, 1, ..., N—1 
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b is a constant for the computation. The updating of the filter coefficients 
can be interleaved with the computation of the filter output so that it takes 
3 cycles per filter tap to do both. The updated coefficients are written over 
the old filter coefficients. Example 12-35 shows the implementation of an 
adaptive FIR filter on the 'C40. The memory organization and the position- 
ing of the data in memory should follow the same rules as the above FIR filter 


with fixed coefficients. 


Example 12-35. Adaptive FIR Filter (LMS Algorithm) 


i i a i i a i i i i a i a a ey 


+ Fe FF + HF KF 


TITLE ADAPTIVE FIR FILTER (LMS ALGORITHM) 
SUBROUTINE LMS 


LMS == LMS ADAPTIVE FILTER 


EQUATIONS: y(n) = h(n,0)*x(n) + A(n,1)*x(n-1) + 


+ h(n,N-1) *x(n-(N-1) ) 


FOR (i = O; i < N; itt) h(nt+1,1) = A(n,1) 


+ tmuerr * x(n-i) 


TYPICAL CALLING SEQUENCE: 


load R4 
load ARO 
LAJU LMS 
load AR1 
load RC 
load BK 


ARGUMENT ASSIGNMENTS: 
BEGUN! FUNCTION 


| SCALE FACTOR (2 * mu * err) 
| ADDRESS OF h(n,N-1) 

AR1 | ADDRESS OF x (n-(N-1) ) 
| LENGTH OF FILTER - 2 (N-2) 
| LENGTH OF FILTER (N) 


REGISTERS USED AS INPUT: R4, ARO, AR1, RC, BK 
REGISTERS MODIFIED: RO, Rl, R2, ARO, AR1, RC 
REGISTER CONTAINING RESULT: RO 

PROGRAM SIZE: 12 words 


EXECUTION CYCLES: 6 + 3N 


12-59 


fa cbemeiteabaoenb cll Operations - — FIR, IIR, Adaptive uss 


* SETUP (1 = 0) 


~-global LMS 
LMS RPTBD LOOP Setup ene: delayed repeat block. 
* Initialize RO: 
h(n,N-1) * x(n-(N-1)) -> RO 
Initialize R2 


me eo 


MPYF3 *ARO, *AR1, RO 
SUBF3 R2, R2, R2 


we Reo 


; Initialize Rl: | 

MPYF3 *AR1++(1)%,R4,R1 ; x(n-(N-1)) * tmuerr -> Rl 

ADDF3 *ARO++(1),R1,R1 ; h(n,N-1) + x(n-(N-1)) * 

; tmuerr -> R1 
* 
x FILTER AND UPDATE (1 <= I < MN) 
* Filter: 

MPYF3 *ARO--(1),*ARI1, RO ; h(n,N-1-i) * x(n-(N-1-1)) -> RO 
| | ADDF3 RO,R2,R2 ; Multiply and add operation. 
ie 
* | ; UPDATE: | | 

MPYF3 *AR1++(1)%,R4,R1 ; x(n,N-(N-1-1)) * tmuerr -> R1 
| | STF R1, *ARO++ (1) ; Rl -> A(nt+1,N-1-(i-1)) 

* 

LOOP ADDF3 *ARO++(1),R1,R1 ; h(n,N-1-i) + x(n-(N-1-i) ) 
| ; *tmuerr -> Rl 

* 

BUD R11 ; Delayed return 
* 

ADDF3 RO, R2, RO 7 Add last product. 

STF R1, *-ARO (1) s h(n,0) + x(n* tmuerr -> 

,; h(nt+l ,0) 

NOP 
* 

“3 end 
* 
.end 
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12.4.3 Matrix-Vector Multiplication 


In matrix-vector multiplication, a K x N matrix of elements m(i,j), having K 
rows and N columns, is multiplied by an N x 1 vector to produce a K x 1 re- 
sult. The multiplier vector has elements v(j), and the product vector has ele- 
ments p(i). Each one of the product-vector elements is computed by the fol- 
lowing expression: 


p(i) = m(i,0) v(0) + m(i,1) v(1) +...+ m(i,N-1) VIN-1) i= 0,1,...,K=1 


This is essentially a dot product, and the matrix-vector multiplication con- 
tains, as aspecial case, the dot product presented in Example 12—20n page 
12-10 and Example 12-3 on page 12-12. In pseudo-C format, the computa- 
tion of the matrix multiplication is expressed by 


for (i = 0; 1 < K; i++) { 
p(i) = 0 
for (j = 0; | < N; j++) 
P(i) = pli) + m(i,j) * vi) 


Figure 12—4 shows the data memory organization for matrix-vector multipli- 
cation, and Example 12-36 shows the ’C40 assembly code to implement 
it. Note that in Example 12-36, K (number of rows) should be greater than 
0, and N (number of columns) should be greater than 1. 


Figure 12-4. | Data Memory Organization for Matrix-Vector Multiplication 


result 


matrix storage ene eterage vector storage 
Pere m(0, Se et i 
| wt) pt) 
@ @ 2] 
& 
| (N-1) — 1) 
— a Net 0 _ p(K—1) 
high 2 1) | 
address 
© 
® 
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Example 12-36. Matrix Times a Vector Multiplication 


* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 


TITLE MATRIX TIMES A VECTOR MULTIPLICATION 
SUBROUTINE MAT 
MAT == MATRIX TIMES A VECTOR OPERATION 


TYPICAL CALLING SEQUENCE: 


load ARO 
load AR1 
load AR2 
load AR3 
load R1 

CALL MAT 


ARGUMENT ASSIGNMENTS: 


ARGUMENT | FUNCTION | 
oo oa Ss es fe a a ea 


ARO | ADDRESS OF M(0,0) 

AR1 | ADDRESS OF V(0O) 

AR2 | ADDRESS OF P (0) 

AR3 | NUMBER OF ROWS - 1 (K-1) 

R1 | NUMBER OF COLUMNS - 2 (N-2) 


REGISTERS USED AS INPUT: ARO, AR1, AR2, AR3, R1 
REGISTERS MODIFIED: RO, R2, ARO, ARI, AR2, AR3, IRO, RC 


PROGRAM SIZE: 11 


EXECUTION CYCLES: 5 + 7K + KN = 5 + ((N-1) + 8) * K 


-global MAT 


+ 


* SETUP 
* 
MAT ADDI3 R1,2, IRO ; IRO = N 
* 
* FOR (i = 0; i < K; it++) LOOP OVER THE ROWS. 
* 
ROWS RPTBD Dot ; Setup mulitply a row by a 
; column. 
LDI R1,RC ; Set loop counter 
LDF 0.0,R2 ; Initialize R2 
MPYF3 *ARO++(1),*AR1++(1),RO ; m(i,0) * v(0) -> RO 
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* FOR (j = 1; 3 < N; j++) DO DOT PRODUCT OVER COLUMNS 
* 
DOT MPYF3 *ARO++ (1), *AR1++(1),RO 3; m(i,j) * v(j) -> RO 
| ADDF3 RO,R2,R2 ; m(i,j-1) *v(j-1) + 
; R2 -> R2. 
* 
DBD AR3, ROWS ; counts the number of rows 
. Lert. 
* 
* 
ADDF RO, R2 ; last accumulate. 
STF R2, *AR2++ (1) ; result -> p(i) 
NOP *--AR1 (IRO) ; set AR1 to point to v(0). 
* !!! DELAYED BRANCH HAPPENS HERE !!! 
* 
* * RETURN SEQUENCE 
* 
RETS ,; return 
* 
* end 
* 
.end 


12.4.4 Fast Fourier Transforms (FFT) 


Fourier transforms are an important tool often used in digital signal process- 
ing systems. The purpose of the transform is to convert information from the 
time domain to the frequency domain. The inverse Fourier transform con- 
verts information back to the time domain from the frequency domain. Im- 
plementation of Fourier transforms that are computationally efficient are 
known as fast Fourier transforms (FFTs). The theory of FFTs can be found 
in books such as DFT/FFT and Convolution Algorithms by C.S. Burrus and 
T.W. Parks (John Wiley, 1985), and in the book Digital Signal Processing 
Applications with the TMS320 Family. | 


Certain ’C40 features that increase efficient implementation of numerically 
intensive algorithms are particularly well-suited for FFTs. The high speed 
of the device (40-ns cycle time) makes the implementation of realtime algo- 
rithms easier, while the floating-point capability eliminates the problems as- 
sociated with dynamic range. The powerful indexing scheme in indirect ad- 
dressing facilitates the access of FFT butterfly legs that have different 
spans. The repeat block implemented by the RPTB or RPTBD instruction 
reduces the looping overhead in algorithms heavily dependent on loops 
(such as the FFTs). This construct gives the efficiency of in-line coding but 
has the form of a loop. Since the output of the FFT is in scrambled (bit-re- 
versed) order when the input is in regular order, it must be restored to the 
proper order. This rearrangement does not require extra cycles. The device 
has a special form of indirect addressing (bit-reversed addressing mode) 
that can be used when the FFT output is needed. 12 
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The ’C40 can implement this mode on either the CPU or DMA. This mode 
permits accessing the FFT output in the proper order. If the DMA transfer 
with bit-reversed addressing mode is used, there is no overhead for data in- 
put and output. 


There are several types of FFTs: 


[} Radix-2 and radix-4 algorithms depending on the size of the FFT 
butterfly 


[1 Decimation in time or frequency (DIT or DIF) 
CL) Complex or real FFTs 
[1 FFTs of different lengths, etc. 


The examples in this section of FFT implementation are based on programs 
contained in the application report, “An Implementation of FFT, DCT, and 
Other Transforms on the TMS320C30”, by Panos Papamichalis in the Digi- — 
tal Signal Processing Applications with the TMS320 Family, 
volume III. 


Example 12-37 and Example 12—38 show the implementation of acomplex 
radix-2, DIF FFT on the ’C40. Example 12-37 contains the generic code of 
the FFT that can be used with any length number. However, for the complete 
implementation of an FFT, a table of twiddle factors (sines/cosines) is need- 
ed, and this table depends on the size of the transform. To retain the generic 
form of Example 12-37, the table with the twiddle factors (containing 1-1/4 
complete cycles of a sine) is presented separately in Example 12-38 for the 
case of a 64-point FFT. A full cycle of a sine should have a number of points 
equal to the FFT size. In Example 12-38, the FFT length N and M, which 
is equal to the logarithm of N to base equal to the radix, are defined. M is 
the number of stages of the FFT. For a 64-point FFT, M = 6 when using a 
radix-2 algorithm, or M = 3 when using a radix-4 algorithm. If the table with 
the twiddle factors and the FFT code are kept in separate files, they should 
be connected at link time. 
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Example 12-37. Complex, Radix-2, DIF FFT 


* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
* 


INP 


* 


FFTSIZ 
LOGFFT 
SINTAB 
INPUT 
OUTPUT 
FFT: 


TITLE COMPLEX, RADIX-2, DIF FFT 


GENERIC PROGRAM FOR A LOOPED-CODE RADIX-2 FFT COMPUTATION 
IN 320C40 


THE PROGRAM IS DERIVED FROM THE BURRUS AND PARKS 
BOOK, “DFT/FFT AND CONVOLUTION ALGORITHMS”, PAGE 111. 
THE (COMPLEX) DATA RESIDE IN INTERNAL MEMORY. 

THE COMPUTATION IS DONE IN-PLACE, BUT THE 

RESULT IS MOVED TO ANOTHER MEMORY SECTION TO 
DEMONSTRATE THE BIT-REVERSED ADDRESSING. 


THE TWIDDLE FACTORS ARE SUPPLIED IN A TABLE PUT INA 
-DATA SECTION. THIS DATA IS INCLUDED IN A SEPARATE 
FILE TO PRESERVE THE GENERIC NATURE OF THE PROGRAM. 
FOR THE SAME PURPOSE, THE SIZE OF THE FFT N AND 
LOG2(N) ARE DEFINED IN A .GLOBL DIRECTIVE AND 
SPECIFIED DURING LINKING. 


-globl FFT Entry point for execution 


~-globl N ; FFT size 

-globl M ; LOG2(N) 

~-globl SINE ; Address of sine table 

-usect “IN”,1024 ; Memory with input data 

-BSS OUTP, 1024 ; Memory with output data 

text 

INITIALIZE 

word N 

.word M 

.word SINE 

.word INP 

.word OUTP 

LDP FFTSIZ ; Command to load data page pointer 

LDI @FFTSIZ,R7 ; R7=N2 

LSH3 -2,R7,IR1 ; IR1=N/4, pointer for SIN/COS table 

LDI @LOGFFT,R9 ; RQ holds the remain stage number 

LSH3 1,R7,1IRO ; IRO=2*N1 (because of real/imag) 

LDI 1,R8 ; Initialize repeat counter of first 
; loop 

LDI 1,AR5 ; Initialize IE index (AR5=IE) 

LDI @INPUT, R10 ; R10 points to X(I) 
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* OUTER LOOP 
LOOP : RPTBD BLK1 ; Setup for first loop 
LDI R10, ARO ; ARO points to X(I) 
ADDI R7, ARO, AR2 ; AR2 points to X(L) 
SUBI3 1,R8,RC ; RC shouldbeonelessthan 
: ; desired # 
* FIRST LOOP 
ADDF * ARO, *AR2, RO ; RO = X(I) + X(L) 
SUBF *AR2++, *ARO++,R1 ; Rl = X(I) - X(L) 
ADDF *AR2, *ARO,R2 , R2 = Y (2) YL) 
SUBF *AR2, *ARO,R3 ; R3 = Y(I) - Y(L) 
STF R2, *ARO-- ; Y(I) = R2 and... 
|| STF R3, *AR2-- ; Y(L) = R3 
BLK1 STF RO, *ARO++ (IRO) ; X(I) = RO and. 
1 | STF R1, *AR2++ (IRO) ; X(L) = R1 and ARO, 2 = ARO,2 + 2¥*n 
* IF THIS IS THE LAST STAGE, YOU ARE DONE 
SUBI 1,R9 
BZD END 
* MAIN INNER LOOP 
LDI 2,AR1 ; Init loop counter for inner loop 
LDI @SINTAB, AR4 ; Initialize IA index (AR4=TIA) 
ADDI ARS, AR4 ; IA=IA+IE; AR4 points to cosine 
ADDI R10, AR1, ARO ; (X(I),Y(I)) pointer 
ADDI 2,AR1 ; Increase inner loop counter 
INLOP: RPTBD BLK2 ; Setup for second loop 
ADDI R7, ARO, AR2 ; (X(L),Y(L)) pointer 
SUBI 1,R8,RC ; RC should be one less than 
| ; desired # 
LDF *AR4,R6 ; R6=SIN 
* SECOND LOOP 
SUBE *AR2, *ARO, R2 3; R2=X (I) —-X(L) 
SUBF *+AR2, *+ARO,R1 
bs ’ Rl = Y(I) - Y(L) 
MP YF R2,R6, RO ; RO = R2*SIN and 
| | ADDF *+AR2,*+AR0,R3 ; R3 = Y(I) + Y(L) 
MPYF R1,*+AR4(IR1),R3 ; R3 = Rl * COS and... 
| | STF R3, *+ARO | , Y(I) = Y¥(I) + Y(L) 
SUBF RO,R3,R4 ; R4 = R1*COS - R2*SIN 
MPYF R1,R6, RO ; RO = R1*SIN and... 
| | ADDF *AR2, *ARO,R3 ; R3 = X(I) + X(L) 
MPYF R2, *+AR4(IR1),R3 ; R3 = R2 * COS and 
| | STF R3, *ARO++ (IRO) 
* 
* ; X(I) = X(I) + X(L) and ARO=ARO + 2*N1 
ADDF RO,R3,R5 ; RS = R2*COS + R1*SIN 
BLK2 STF R5, *AR2++ (IRQ) ; X(L) = R2*COS + R1*SIN, incr AR2 
; and... 
| | STF R4, *+AR2 ; Y(L) = R1*COS - R2*SIN 
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END: 


SELF 


CMP I R7,AR1 


BNEAF INLOP ; Loop back to the inner loop 
ADDI AR5, AR4 ; IA = IA + IE; ARsointtsocosine 
ADDI R10, AR1, ARO ; (X(I),Y(I)) pointer 
ADDI 2,AR1 ; Increase inner loop counter 
LSH 1,R8 ; Increment loop counter for 

; next time 
BRD LOOP ; Next FFT stage (delayed) 
LSH 1,AR5 ; IE = 2*IE 
LDI R7, IRO ; Nl = N2 
LSH -1,R7 ; N2 = N2/2 


STORE RESULT OUT USING BIT-REVERSED ADDRESSING 


LDI @FFTSIZ, IRO ; IRO = size of FFT = N 
SUBI3 2, 1RO,RC ; RC =N- 2 

LDI 2, IR1 

RPTBD BITRV 

LDI @INPUT, ARO 

LDI @OUTPUT, AR1 

LDF *+ARO (1) ,RO 

BIT REVERSE LOOP 

LDF *AROQ++ (IRO)B,R1 

STF RO, *+AR1 (1) 

LDF *+ARO(1),RO 

STF R1, *AR1++ (IR1) 

LDF *ARO++ (IRO)B,R1 

STF RO, *+AR1 (1) 

STF R1, *AR1++ (IR1) 

BR SELF ; Branch to itself at the end 
.end 
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Example 12-38. Table with Twiddle Factors for a 64-Point FFT 


* 
* 
* 
* 
* 
: 


-globl 
-globl 
-globl 


-set 
-set 


data 


Ka 


SINE 
-float 
-float 
float 
-float 
-float 
-float 
-float 
-float 
-£float 
float 
float 
-float 
-£Lloat 
-float 
-float 
-float 


COSINE 
-float 
-float 


float. 


float 
-float 
-float 
-fLloat 
float 
-float 
-float 
-float 
-float 
float 
-float 
float 
float 
-float 
-float 
-float 
-float 
float 
-float 
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SINE 
N 
M 


64 


6 


0.000000 
0.098017 
0.195090 
0.290285 
0.382683 
0.471397 
0.555570 
0.634393 
0.707107 
0.773010 
0.831470 
0.881921 
0.923880 
0.956940 
0.980785 
0.995185 


1.000000 
0.995185 
0.980785 
0.956940 
0.923880 
0.881921 
0.831470 
0.773010 
0.707107 
0.634393 
0.555570 
0.471397 
0.382683 
0.290285 
0.195090 
0.098017 
0.000000 
~0.098017 
~0.195090 
-0.290285 
-0.382683 
~0.471397 


TITLE TABLE WITH TWIDDLE FACTORS FOR A 64-POINT FFT 


FILE TO BE LINKED WITH THE SOURCE CODE FOR A 64-POINT, 
RADIX-2 FFT. 
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-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 


-float 
float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
-float 
float 
-float 
-float 


ODDDAOO000 00 00000 


-955570 
- 634393 
~ 707107 
. 773010 
.831470 
-881921 
- 923880 
- 956940 
-980785 
~995185 
.000000 
~995185 
.980785 
- 956940 
923880 
.881921 
.831470 
. 773010 
- 707107 
- 634393 
~959570 
s471L397 
382683 
-290285 
-195090 
.098017 


-000000 
.098017 
~195090 
-290285 
382683 
~471397 
s000070 
. 634393 
.707107 
. 773010 
-831470 
.881921 
- 923880 
- 956940 
.980785 
~995185 


The radix-2 algorithm has tutorial value because it is relatively easy to un- 
derstand how the FFT algorithm functions. However, radix-4 implementa- 
tions can increase the speed of the execution by reducing the overall arith- 
metic required. Example 12-39 shows the generic implementation of a 
complex, DIF FFT in radix-4. A companion table, like the one in 
Example 12-38, should have a value of M equal to the log N, where the base 


of the logarithm is four. 
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Example 12-39. Complex, Radix-4, DIF FFT 


+ + + FHF HF HF HF HF HF HF FH HF HK HH KH HK KH HK HK OF 


INP 


* 


FFTSIZ 
LOGFFT 
SINTAB 
INPUT 

OUTPUT 


FFT: 


LSH3 


LSH. 
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TITLE COMPLEX, RADIX-4, DIF FFT 


GENERIC PROGRAM TO DO A LOOPED-CODE RADIX-4 FFT COMPUTATION IN 
THE TMS320C40. 


THE PROGRAM IS DERIVED FROM THE BURRUS AND PARKS BOOK, 
DFT/FFT AND CONVOLUTION ALGORITHMS, P. 117. THE 
(COMPLEX) DATA RESIDE IN INTERNAL MEMORY, AND THE 
COMPUTATION IS DONE IN-PLACE. 


THE TWIDDLE FACTORS ARE SUPPLIED IN A TABLE PUT INA 

~DATA SECTION. THIS DATA IS INCLUDED IN A SEPARATE FILE TO 
PRESERVE THE GENERIC NATURE OF THE PROGRAM. FOR THE SAME 
PURPOSE, THE SIZE OF THE FFT N AND LOG4(N) ARE DEFINED IN A 
~-GLOBL DIRECTIVE AND SPECIFIED DURING LINKING. 


IN ORDER TO HAVE THE FINAL RESULT IN BIT-REVERSED ORDER, THE 
TWO MIDDLE BRANCHES OF THE RADIX-4 BUTTERFLY ARE INTERCHANGED 
DURING STORAGE. NOTE THIS DIFFERENCE WHEN COMPARING WITH THE 
PROGRAM IN P. 117 OF THE BURRUS AND PARKS BOOK. 


~globl FFT Entry point for execution 


~-globl N ; FFT. size 

~-globl M ; LOG4 (N) 

~-globl SINE ; Address of sine table 

-usect “IN”,1024 ; Memory with input data 

.bss OUTP, 1024 ; Memory with output data 

text 

INITIALIZE 

.word N ; FFT size 

.word M ; LOG4 (FFTSIZ) 

.word SINE ; Sine/cosine table base 

.word INP ; Area with input data to process 

.word OUTP ; Area with output data to process 

LDP FFT ; Command to load data page pointer 

LDI @FFTSIZ, BK 

1,BK, IRO ; IRO=2*N1 (because of real/imag) 

LSH3 -2,BK,IR1 ; IR1=N/4, pointer for SIN/COS table 

LDI 1,AR7 ; Initialize IE index 

LDI 1,R8 ; Initialize repeat counter of first 
; loop. 

ADDI 2, 1R1,R9 ; RO = JT = RO/2 + 2 

-1,BK ; R7 = N2 
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* OUTER LOOP 

LOOP : LDI @INPUT, ARO ; ARO points to X(I) 
ADDI BK, ARO, AR1 ; AR1 points to X(I1) 
ADDI BK, AR1, AR2 ; AR2 points to X(I2) 
RPTBD BLK1 ; Setup loop BLK1 
ADDI BK, AR2, AR3 ; AR3 points to X(I3) 
SUBI3 1,R8,RC ; RC should be one less 

; than desired # 

LDF *+AR1,RO ; RO = Y(TI1) 

* FIRST LOOP: BLK1 | 
ADDF RO, *+AR3,R3 ¢; R3 = Y(I1) + Y(I3) 
ADDF *+ARO, *+AR2,R1 ,; RL = Y(I) + Y(I2) 
ADDF R3,R1,R6 ; RO = R1 + R3 
SUBF *4+AR2, *+ARO, R4 ; R4 = Y(I) - Y(I2) 
LDF *AR2,R5 ; RS = X(I2) 

| | STF R6, *+ARO ; Y(I) = R1 + R3 
SUBF R3,R1 ; R1l = R1 - R3 
ADDF *AR3, *AR1,R3 ; R3 = X(I1) + X(I3) 
ADDF R5, *ARO,R1 3 Rl = X(I) + X(I2) 

| | STF R1, *+AR1 ; Y(I1) = R1 - R3 
ADDF R3,R1,R6 ; RO = R1 + R3 
SUBF R5, *ARO, R4 ,; R4 = X(I) - X(T2) 

| | STF | R6, *ARO++ (IRO) ,; X(I) = R1 + R3 
SUBF R3,R1 ; Rl = R1 - R3 
SUBF *AR3, *AR1,R6 ; RO = X(I1) - X(TI3) 
SUBF RO, *+AR3,R3 ; -R3 = Y(I1) - Y(I3) 

| | STF — R11, *AR1++ (IRO) ; X(I1) = R1-R3 
SUBF ~R6,R4,R5 ; RS = R4 —- R6 
ADDF R6,R4 + R2 = R4 + R6 
STF R5, *+AR2 ; Y(I2) = R4 —- R6 

| | STF R2, *+AR3 ; Y(I3) = R4 + R6 
SUBF R3,R2,R5 ; R5 = R2 - R3 
ADDF R3,R2 ; R2 = R2 + R3 
STF R2, *AR3++ (IRO) ; X(I3) = R2 + R3 

BLK1 STF R5, *AR2++ (IRO) ; X(I2) = R2 - R3 

| | LDF *+AR1, RO ; RO = Y(I1) 

* IF THIS IS THE LAST STAGE, YOU ARE DONE 
CMP I IR1,R8 

BZD END 

* MAIN INNER LOOP 
LDI 1,R10 ; Init IAl index 
LDI 2,R11 ; Init loop counter for inner loop 
LDI R11, ARO 
ADDI @ INPUT, ARO ; (X(I),Y¥(I)) pointer 
ADDI 2,R11 ; Increment inner loop counter 
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ADDI R10, AR7 

ADDI BK, ARO, AR1 

CMP I R9,R11 

BZD SPCL 

ADDI BK, AR1, AR2 

ADDI BK, AR2, AR3 

SUBI3 1,R8,RC 

LDI R10, AR4 

ADDI @SINTAB, AR4 

ADDI AR4,R10,AR5 

SUBI 1,AR5 

RPTBD BLK2 

ADDI 10,AR5,AR6 

SUBI 1,AR6 

LDF *+AR2,R7 

SECOND LOOP: BLK2 

ADDF R7,*+ARO,R3 

ADDF *+AR3,*+AR1,R5 
ADDF R5,R3,R6 

SUBF R7, *+ARO, R4 

SUBF R5,R3 

ADDF *AR2, *ARO,R1 
ADDF *AR3, *AR1,R5 
MPYF R3, *+AR5 (IR1) ,R6 
STF R6, *+ARO 

ADDF R5,R1,R0 

SUBF *AR2, *ARO, R2 
SUBF R5,R1 

MPYF R1, *AR5, RO 

STF RO, *ARO++ (IRO) 
SUBF RO, R6 

SUBF *4+AR3,*+AR1,R5 
MPYF R1, *+AR5 (IR1),RO 
STF R6, *+AR1 

MPYF R3, *AR5, R6 

ADDF RO,R6 

ADDF R5,R2,R1 

SUBF R5,R2 

SUBF *AR3, *AR1,R5 
SUBF R5,R4,R3 

ADDF R5,R4 

MPYF R3, *+AR4 (IR1) ,R6 
STF R6, *AR1++ (IRO) 
MPYF R1, *AR4,RO 

SUBF RO, RG 

MPYF R1, *+AR4 (IR1) ,R6 
STF R6, *+AR2 

MPYF R3, *AR4,RO. 
ADDF RO, R6 

MP YF R4, *+AR6 (IR1) ,R6 
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IA1 = IA1 + IE | 
(X(I1),Y(I1)) pointer 
If LPCNT = JT, go to 
special butterfly 

(X(I2),Y(I2)) pointer 
(X(I3),Y(I3)) pointer 


RC should be one less than 


desired # 
Create cosine index AR4 


IA2 = IAl1 + IA1 - 1 
Setup loop BLK2 


IA3 = IA2 + IA1 - 1 


R7 = Y(I2) 

R3 = ¥(I) + Y(I2) 
R5 = ¥(I1) + Y¥(I3) 
R6 = R3 + RS 

R4 = Y¥(I) - Y(I2) 
R3 = R3 — RS 

Rl = X(I) + X(I2) 
R5 = X(I1) + X(I3) 
R6 = R3*CO2 


Y(I) = R3 + RS 
RO = R1 + RS 
= X(I) - X(I2) 
Rl = R1 —- ROS 
= R1*SI2 
X(I) = R1 + RS 
R6 = R3*CO2 —- R1*SI2 


R5 = Y(I1) - Y(I3) 

RO = R1*COQ2 

Y(I1) = R3*CO2 - R1*STI2 
R6 = R3*SI2 

R6 = R1*CO2 + R3*SI2 

Rl = R2 + R5 

R2 = R2 —- RS 

R5 = X(I1) - X(I3) 

R3 = R4 - RS 

R4 = R4 + RS 

R6 = R3*COl 

X(I1) = R1*CO2 + R3*ST2 
RO = R1*SI1 

R6 = R3*COl — R1*SI1 

R6 = R1*COl 

Y(I2) = R3*COl - R1*SI1 
RO = R3*SI1 

R6 = R1*COl + R3*SI1 

RO = R4*CO3 
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Ap 


BLK2 


SPCL 


*SPCL LO 


STF 
MP YF 
SUBF 
MP YF 
STF 
MP YF 
ADDF 
STF 
LDF 
CMP TI 
BPD 
LDI 
ADDI 
ADDI 
BRD 
LSH 


LSH 
LDI 


SPECIAL BUTTERFLY FOR W=J 


RPTBD 


LDF 
OP: BLK3 


R6, *AR2++ (IRQ) 
R2, *AR6, RO 
RO, R6 


R2, *+AR6 (IR1) ,R6 


R6, *+AR3 
R4, *AR6, RO 
RO, R6 

R6, *AR3++ (IRO) 
*+AR2,R7 
R11,BK 
INLOP 

R11, ARO 
@INPUT, ARO 
2,R11 

CONT 

2,R8 


BK, IRO 


BLK3 
-1,IR1,AR4 
@SINTAB, AR4 


*AR2,R7 


R7, *ARO,R1 
*+AR2,*+ARO0,R3 
*+AR2,*+ARO,R4 
*AR3,*AR1,R5 
R1,R5,R6 

R5,R1 
*4+AR3,*+AR1,R5 
R5,R3,RO 

R5,R3 

R7, *ARO, R2 

R3, *+ARO 
*AR3,R7 

R1, *ARO++ (IRO) 
R7,*AR1,R1 

R6, *+AR1 
*+AR3,*+AR1,R3 
R3,R2,R5 
R2,R3,R2 
R1,R4,R3 
R1,R4. 
R5,R3,R1 

R1, *AR4,R1 

RO, *AR1++ (IRO) 
R5,R3 
R3,*AR4,R3 

R1, *+AR2 
R4,R2,R1 


° 
’ 
e 
c 
e 
c 
° 
Ud 
e 
0 
° 
e 
° 
td 
° 
td 
° 
? 
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X(I2) = R1*CO1 + R3*SI1 
RO = R2*SI3 


R6 = R4*CO3 - R2*SI3 
R6 = R2*CO3 
Y(I3) = R4*CO3 - R2*SI3 


RO = R4*SI3 

R6 = R2*CO3 + R4*SI3 
x(13) = R2*CO3 + R4*SI3 
Load next Y(I2) 


LOOP BACK TO THE INNER LOOP 


(X(I),Y(I)) pointer 
Increment inner loop counter 


Increment repeat counter for 
next time 

IE = 4*IE 

Nl = N2 


Setup loop BLK3 
Point to SIN(45) | 
Create cosine index AR4 = C021 


R7 = X(I2) 

Rl = X(I) + X(I2) 
R3 = Y(I) + Y(I2) 
R4 = Y(I) - Y(I2) 
R5 = X(I1) + X(I3) 
R6 = R5 —- RI 

Rl = R1 + RS 

RS = Y(I1) + Y(I3) 
RO = R3 —- R5 

R3 = R3 + RS 

R2 = X(I) - X(I2) 
Y(I) = R3 + RS 

R7 = X(I3) 

X(I) = R1 + R5 

R1 = X(I1) - X(I3) 
Y(I1) = R5 - Ril 
R3 = Y¥(I1) - ¥(I3) 
R5 = R2 + R3 

R2 = -R2 + R3 

R3 = R4 - RI 

R4 = R4 + RI 

Rl = R3 — R5 

R1 = R1*CO21 

X(I1) = R3 - RS5 


R3 = R3 + RS 
R3 = R3*CO21 
¥(I2) = (R3 - R5)*CO21 
Rl = R2 - R4 
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MP YF R1, *AR4,R1 ; Rl = R1*CO21 
| | STF R3, *AR2++ (IRQ) ; X(I2) = (R3 + R5)*CO21 
ADDF R4,R2 ; R2 = R2 + R4 
MPYF3 R2, *AR4,R2 ; R2 = R2*CO21 
| | STF R1, *+AR3 ; Y(I3) = -—(R4 - R2)*CO21 
BLK3 LDF *AR2,R. ; Load next X(I2) 
| | STF R2, *AR3++ (IRO) ; X(I3) = (R4 + R2)*CO21 
CMPI R11, BK 
BPD INLOP ; Loop back to the inner loop 
LDI R11, ARO 7 
ADDI @INPUT, ARO * (X(I),Y(I)) pointer 
ADDI 2,R11 ; Increment inner loop counter 
LSH 2,R8 ; Increment repeat counter for next 
; time 
LSH 2,AR ; IE = 4*IE 
LDI BK, IRO ; Nl = N2 
CONT BRD LOOP ; Next FFT stage (delayed) 
LSH -2,BK | ; N2 = N2/4 
LSH3 -1,BK,R9 
ADDI 2,R9 ; JT = N2/2 + 2 
* STORE RESULT OUT USING BIT-REVERSED ADDRESSING 
END: LDI @FFTSIZ, IRO ; IRO = size of FFT = N 
SUBI3 2, 1IRO,RC ; RC =N - 2 
LDI 2,IR1 
RPTBD BITRV 
LDI @INPUT, ARO 
LDI @OUTPUT, ARI 
LDF *+ARO (1) ,RO 
* BIT REVERSE LOOP 
| LDF *ARO++(IRO)B,R1 
| | STF RO, *+AR1 (1) 
BITRV LDF *+ARO (1) ,RO 
| | | STF R1, *AR1++ (IR1) 
LDF *ARO++(IRO)B,R1 
[| STF RO, *+AR1 (1) 
STF R1, *AR1++ (IR1) 
SELF BR SELF ; Branch to itself at the end. 
.end 


Most often, the data to be transformed is a sequence of real numbers. In this 
case, the FFT demonstrates certain symmetries that permit the reduction 
of the computational load even further. Example 12—40 shows the generic 
implementation of a real-valued, radix-2 FFT. For such an FFT, the total | 
storage required for a length-N transform is:only N locations; in a complex 
FFT, 2N are necessary. Recovery of the rest of the points is based on the 
symmetry conditions. 
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Example 12-40. Real, Radix-2 FFT 


TITLE REAL, RADIX-2 FFT 


GENERIC PROGRAM TO DO A RADIX-2 REAL FFT COMPUTATION 
IN 320C40. 


THE PROGRAM IS DERIVED FROM THE PAPER BY SORENSEN ET AL., 
JUNE 1987 ISSUE OF THE TRANSACTIONS ON ASSP. 


THE REAL DATA RESIDE IN INTERNAL MEMORY. THE COMPUTATION IS 
DONE IN-PLACE. THE BIT-REVERSAL IS DONE AT THE BEGINNING OF 
THE PROGRAM. 


THE TWIDDLE FACTORS ARE SUPPLIED IN A TABLE PUT IN A .DATA 
SECTION. THIS DATA IS INCLUDED IN A SEPARATE FILE TO PRESERVE 
THE GENERIC NATURE OF THE PROGRAM. FOR THE SAME PURPOSE, THE 
SIZE OF THE FFT N AND LOG2(N) ARE DEFINED IN A .GLOBL 
DIRECTIVE AND SPECIFIED DURING LINKING. THE LENGTH OF 

THE TABLE IS N/4 + N/4 = N/2. 


+ + H+ + + He HF HF HF HH HF KF KF HF HK KH HK HK KF 


-globl FFT Entry point for execution 


~globl N ; FFT size 
~-globl M ; LOG2 (N) 
-globl SINE ; Address of sine table 
.bss INP, 1024 ; Memory with input data 
-text 
* INITIALIZE 
FFTSIZ .word N 
LOGFFT .word M 
SINTAB .word SINE 
INPUT .word INP 
FFT: LDP FFTSIZ ; Command to load data page printer 
* DO THE BIT~REVERSING AT THE BEGINNING 
LDI @FFTSIZ,R8 ; RB=N 
SUBI ~ 1,R8,RC ; RC should be one less 
; than desired # 
LDI @SINTAB, R9 
RPTBD BITRV ; Setup for BITRV loop 
LSH3 -1,R8,IRO ; IR1 = half the size of FFT = N/2 
LDI @INPUT, ARO ; ARO points to X(I) 
LDI @INPUT, AR1 ; AR1 points to X(I) 
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* DIGIT REVERSE COUNTER 
CMP I AR1, ARO ; Exchange locations only 
LDF *ARO++(1),R1 ; if ARO<AR1 
L | LDF *AR1++ (IRO)B, RO , 
LDFLT *ARO, RO 
LDFLT *AR1,R1 
BITRV STF RQ, *AR1 
{| STF Rl, *ARO 
* LENGTH-TWO BUTTERFLIES 
LDI- @INPUT, ARO ; ARO points to X(T) 
RPTBD BLK1 ; Setup for BLK1 loop 
SUBI3 1,IRO,RC ; RC = (N/2) -1 
LDF *+ARO0(1),R2 3; R2 = X(I + 1) 
LDI 2,IRO ; IRO = 2 = N2 
* BLK1 LOOP | 
ADDF R2, *ARO, RO ; RO = X(I) + X(I + 1) 
SUBF R2,*ARO,R1 ; Rl = X(I) - X(I + 1) 
| | STF RO, *ARO++ (1) ; X(I) = X(I) + X(I + 1) 
BLK1 LDF *+ARO (IRO),R2 ; Load next X(I) | 
1 | STF R1, *ARO++ (1) ; X(I + 1) = X(I) - X(I + 1) 
* FIRST PASS OF THE DO-20 LOOP (STAGE K = 2 IN DO-10 LOOP) 
LDI @INPUT, ARO * ARO points to X(TI) 
RPTBD BLK2 _ ; Setup for BLK2 loop 
LSH3 —-2,R8,RC ; Repeat N/4 times 
SUBI 1,RC ; RC should be one less 
| | ; than desired # 
LDF *+ARO (IRO),R2 ; R2 = X(I + 2) 
* BLK2 LOOP | 
ADDF R2, *ARO++ (IRO) , RO ; RO = X(I) + X(I + 2) 
 SUBF R2, *-ARO (IRO),R1 ; R1 = X(I) - X(I + 2) 
|| STF RO, *-ARO (IRO) ; X(I) = X(I) + X(I + 2) 
NEGF *+ARO, RO ; RO = -X(I + 3) | 
{| STF Rl, *ARO++ (IRO) ; X(I + 2) = X(I) -— X(I + 2) 
BLK2 LDF *+ARO (IRO) , R2 ; Load next X(I + 2) 
| | STF RO, *-ARO ; X(I + 3) = -X(I + 3) 
* MAIN LOOP (FFT STAGES) 
LSH3 —~3,R8,IRO ; IRO = E/2 index for E 
LDI 3,R11 ; R11 holds the current 
; stage number 
LDI 2,R4 ; R4 = N4 
| LDI 4,R3 ; R3 = N2 
LOOP LDI @INPUT, ARS ; AR5 points to X(I) 
LSH3 2,R4,R10 ; Set loop counter 
ADDI3 IRO,R9, ARO ; ARO points to SIN/COS 


table 


12-76 7 | Software Applications 


INLOP 
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INNER LOOP (DO-20 LOOP IN THE PROGRAM) 


STF 


INNERMOST 


RPTBD 
LSH3 


LOOP 


R4,IR1 
1, ARS, AR1 
R3, AR1, AR3 


2,AR3,AR2 
R3,AR2,AR4 


*AR5++ (IR1) ,RO 
*+AR5 (IR1),RO,R1 
RO, *++AR5 (IR1),RO 
Rl, *-AR5 (IR1) 

RO 

*++AR5 (IR1),R1 
RO, *ARS 

R1, *AR5 

LOOP 

BLK3 

~2,R8, 


2,R4,RC 
*AR3,R5 


R5, *+ARO (IR1) ,RO 
*AR4, *ARO,R1 
*AR4, *+ARO (IR1),R1 
RO,R1,R2 

R5, *ARO++ (IRO) ,RO 
RO, R1, RO 
*AR2,RO,R1 
*AR2,RO,R1 

R1, *AR3++ 
*AR1,R2,R1 

Rl, *AR4-- 

R2, *AR1,R1 

R1, *AR1L++ 

*AR3,R5 

R1, *AR2-—- 
@FFTSIZ,R10 

INLOP 

R4,AR5 

R10,R10 

IRO,R9, ARO 


1,R11 
@LOGFFT, R11 
LOOP 

-1,IRO 

1,R4 

1,R3 

END 


; 


we Ve Ve Ve We We Ve We We Ve We Ve Veo Ve Vo 


we Ye Veo Ve Veo 


we Ne Ve Ve We We Veo We Ve Ne Be Ve We Vo Veo 


we 


TIR1 = N4 or N2/2 
AR1L points to X(I1) 
AR3 points to 

X(I3) = X(I + J + N2) 
AR2 points to 


X(I2) = X(I - J + N2) 

AR4 points to 

X(I4) = X(I - J + N1) 

RO = X(I) 

Rl = X(I) + X(I + N2) 

RO = -X(I) + X(I + N2) 

X(I) = X(I) + X(I + N2) 

RO = X(I) - X(I + N2) 

R1 = -X(I + N4 + N2) 

X(I + N2) = X(I) - X(I + N2) 


X(I + N4 + N2) = 


Setup for BLK3 loop 
IRl=separation between 
SIN/COS tbls 

Repeat N4 - 1 times 


R5 = X(I3) 

RO = X(I3) *COS 

Rl = X(1I4) *SIN 

Rl = X(I4)*COS 

R2 = X(I3)*COS + X(I4) *SIN 
RO = X(I3)*SIN 

RO = -—-X(1I3)*SIN + X(1I4)*COS 
Rl = -X(I2) + RO 

Rl = X(I2) + RO 

X(I3) = —-X(I2) + RO 

Rl = X(I1) + R2 

X(I4) = X(I2) + RO 

Rl = X(I1) - R2 

X(I1) = X(I1) + R2 

Load next X(I3) 

X(I2) = X(I1) - R2 


Loop back inner to theloop 
AR5 = I+Nil 


ARO points to 
SIN/COS table 


E = E/2 
N4 = 2*N4 
N2 = 2*N2 


Branch to itself at the end. 
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Example 12-37, Example 12-39, and Example 12-40 provide an easy un- 
derstanding of the FFT algorithm functions. However, they are not optimized 
for fast speed execution of FFT. Example 12-41 shows a faster version of 
a radix-2 DIT FFT algorithm. This program uses a different twiddle factors 
table than the previous examples. The twiddle factors are stored in bit re- 
versed order and with a table length of N/2 (N = FFT length). For instance, 
if the FFT length is 32, the twiddle factors table should be: 


Address Coefficient 


bam 


0 R{WN(0)} = COS(2*PI*0/32) = 1 

1 —WN(0)} =  SIN(2*PI*0/32) = 0 

2 R{WN(4)} = COS(2*PI*4/32) = 0.707 
3 -WN(4)} =  SIN(2*PI*4/32) = 0.707 
12 R{WN(3)} = COS(2*PI*3/32) = 0.831 
13 -{WN(3)} = SIN(2*PI*3/32) = 0.556 
14 R{WN(7)} = COS(2*PI*7/32) = 0.195 
15 -{WN(7)} =  SIN(2*PI*7/32) = 0.981 


Example 12-41. Faster Version Complex, Radix-2 DIT FFT 


+e ee FH FH HF EHH HHH HAHA KH HF HF HF FF 
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TITLE FASTER VERSION COMPLEX, RADIX-2 DIT FFT 


GENERIC PROGRAM FOR A FAST LOOPED— -CODE RADIX-2 DIT FFT 
COMPUTATION IN TMS320C40 


THE PROGRAM IS DERIVED FROM THE PAPER BY RAIMUND MEYER AND 
AND KARL SCHWARZ, VOLUME 3, PROCEEDINGS OF ICASSP 90. 

THE (COMPLEX) DATA RESIDE IN INTERNAL MEMORY. THE COMPUTATION 
IS DONE IN-PLACE, BUT THE RESULT IS MOVED TO ANOTHER MEMORY 
SECTION TO DEMONSTRATE THE BIT-REVERSED ADDRESSING. 


FOR THIS PROGRAM THE MINIMUM FFT LENGTH IS 32 POINTS BECAUSE 
OF THE SEPARATE STAGES. FIRST TWO PASSES ARE REALIZED AS A 
FOUR BUTTERFLY LOOP SINCE THE MULTIPLIES ARE TRIVIAL. THE 
MULTIPLIER IS ONLY USED FOR A LOAD IN PARALLEL WITH AN ADDF 


OR SUBF. 


THE TWIDDLE FACTORS ARE SUPPLIED IN A TABLE PUT IN A .DATA 
SECTION. THIS DATA IS INCLUDED IN A SEPARATE FILE TO PRESERVE 
THE GENERIC NATURE OF THE PROGRAM. FOR THE SAME PURPOSE, THE 
SIZE OF THE FFT N AND LOG2(N) ARE DEFINED IN A .GLOBL 
DIRECTIVE AND SPECIFIED DURING LINKING. THE LENGTH OF 

THE TABLE IS N/2. 
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Example 12-41. Faster Version Complex, Radix-2 DIT FFT (Continued) 


* 
* 


~-global fft 
-global n 
-global nhalb 
.global nviert 
-global nachtel 


-global m 
-global sine 
~BSS inp, 2048 ; input vector length = 2n 
; (depends of n)n) 
.BSS outp, 2048 ; output vector length = 2n 
; (depends of n)n) 
~text 
fftsiz .word n 
£g4m2 .word nviert-2 
f£g4m3 .word nviert-—3 
fg8m2 .word nachtel-2 
fg2 .word nhalb 
fg2m3 .word nhalb-3 
logfft .word m | 
Sintab .word sine 
sintml .word Sine-1 
sintp2 .word sinet+2 
input .word inp 
inputp2 .word inp+2 
output .word outp 
* arO : AR + AI 
x arl : BR + BI 
> Sar? %. CR °CL. 4 CR + Cr’ 
* ar3 : DR + DI 
* ar4 : AR’ + AT’ 
* ar5.-<¢ BR’ + Bi’ 
* ar6 : DR’ + DI’ 
* ar7 : first twiddle factor = 1 
LEC ldp fftsiz ; load page pointer 
ldi @f£fg2,ir0 ; irO = n/2 = offset between inputs 
ldi @sintab,ar7 ; ar7 points to twiddle factor 1 
ldi @input, ar0 ; arQ points to AR 
addi ir0,ar0,arl ; axrl points to BR 
addi ir0,arl,ar2 ; ar2 points to CR 
addi ir0,ar2,ar3 ; ar3 points to DR 
ldi ar0Q,ar4 ; ar4 points to AR’ 
ldi arl,ar5 ; ar5 points to BR’ 
ldi ar3,ar6 ; ar6 points to DR’ 
ldi Pp We oot ; address offset 
lsh =1 pa r0 ; irdO = n/4 = number of 
; R4-butterflies 
subi 2,160, re 
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Example 12-41. 


Faster Version Complex, Radix-2 DIT FFT (Continued) 


KREKEKKKKKKKEKKKKKKK KEKE KKKKKKKKKKKKKEKRKKKKKEKEKEKEKKEKKKKKKKKKKKKKKKKKKRKKKKKKEKEK 


KEKKKKKKKKKKKKKKEKKKKEKKKKEKKKKKKKEKKKKKKKEKKKKKKKKKKKKKKKKKKKKK KK KRKKKKKKEKK 


* fill pipeline 


addf *ar2,*ar0,r4 
subf *kar2, *ar0++, r5 
addf *arl, *ar3,r6 
subf *arlt++, *ar3¢++, x7 
addf r6,xr4,xr0 
mpyf *ar3t++, *ar7,r1 
| | subf r6,r4,xr3 
addf r1l,*arl,r0 
| | stf r0, *ar4++ 
subf r1, *arlt+t+,ri1 
| | stf r3, *ar5++ 
| addf a I as te a 
mpy *+ar2,*ar7,r1 
| | subf CL, Soc: 
rptbd blkl 
add r1,*ar0,r2 
| | stf r2, *ar2++ (irl) 
subf r1, *ar0O++, r6 
| | stf r3, *ar6++ © 
addf r0,r2,r4 
* radix-4 butterfly loop 
mpyf *ar2-~—-,*ar7,xr0 
| | subf £0; 72,72 
mpyf *arlt++,*ar7,r1 
| | addf r7,r6,x3 
addf r0,*ar0,r4 . 
| | stf r4, *ar4++ 
subf r0, *ar0++,xr5 
| | stf r2, *ar5++ 
subf r?7,x6,x/ 
addf rl, *ar3,r6 
| | stf r7, *arot+t+ 
subf rl, *ar3++,xr7 
| | stf r3, *ar2++ 
addf r6,r4,r0_ 
mpyf *ar3t+t+,*ar7,r1 
| | subf r6,r4,xr3 
addf rl, *ari,rxr0 
| | stf r0, *ar4tt+ 
: | subf rl, *arlt++,r1 
| | stf r3, *ar5++ 
addf Pig Loge 
mpyf *+tar2,*ar7,r1 
| | subf Ti, ,73 
addf rl, *ar0,r2 
| | stf r2, *ar2++ (irl) 
subf r1,*arQ++, r6 
stf r3, *ar6++ 
addf r0,r2,r4 


we Ve Ve Veo Vo Ve 


r4 = AR + CR 

r5 = AR —- CR 

r6 = DR + BR 

r7 = DR —- BR 

AR’ = r0 = r4 + x6 

rl = DI , BR’ = r3 = r4 - r6 
r0O = BI + DI , AR’ = r0 

rl = BI - DI , BR’ = x3 

CR’ = r2 = r5 + vl 
rl = CI , DR’ = r3 = r5 - rl 
Setup for radix-4 butterfly loop 
r2 = AI + CI , CR’ = r2 

r6 = AI - CI , DR! = x3 

AI’ = r4 = r2 + r0 

r0Q = CR, (BI’ = r2 = r2 - x0) 
rl = BR, (CI’ = r3 = r6 + £7) 
r4 =AR+ CR, (AI’ = r4) 

r5 = AR - CR, (BI’ = r2) 

(DI’ = r7 = r6 - x7) 

r6 = DR + BR, (DI’ = r7) 

r7 = DR ~ BR, (CI’ = r3) 

AR’ = r0 = r4 + r6 

rl = DI , BR’ = r3 = r4 - r6 
rO = BI + DI , AR’ = r0 

rl = BI - DI , BR!’ = x3 

CR’ = r2 = r5 + rl 

rl = CI , DR’ = r3 = r5 - rl 
r2 = AI + CI , CR’ = r2 

r6 = AI -—- CI , DR’ = x3 

AI’ = r2 + x0 


r4 = 
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Example 12-41. Faster Version Complex, Radix-2 DIT FFT (Continued) 


* clear pipeline 


BI’ = r2 = r2 - x0 
CI’ = r3 = r6 + x7 
AI’ = r4 , BI’ = r2 
DI’ = r7 = r6 - r7 
DI’ = x7, CI’ = x3 


KKKEKKKKRKEKKEKEKKKEKKEKKEKKKEKRKKKEKKRKKKKKEKKKKEKKRKEKKKKKKKEKKKKKKEKKKKKRKKKKKKEKK 


KKKEKKEKKKEKKKKKEKREKKKKKREKKEKRKKEKKEKKKKKKEKEKEKKERKERKEKRERKEKKRKKEKKEKKEKRKKKKEKKKKK 


me Ye Ne We We We We VWs We Ve We 


we Me We We 


pointer to twiddle factor 
group counter 

upper real butterfly input 
upper real butterfly output 
lower real butterfly output 
lower real butterfly input 
double group count 

half butterfly count 

clear LSB 

half step from upper to 
lower real part 


step from old imaginary to new 
real value | 

dummy load, only for address 
update 


input 
input 
output 
output. 


° 
? 
e 
? 
° 
c 
e 
t 
° 
, 


r6 = SIN 

rl = BI * SIN 

dummy addf for counter update 
rQ = BR * COS 


x3 TR = rO + rl, rO = BR * SIN 


Setup for loop bflyl 
rl = BI * COS , r2 = AR - TR 


r5 = AR + TR, BR’ = r2 


subf r0,;r2,x2 
addf r7,xr6,x3 
st r4, *ar4 
| | stf r2,*ar5 
subf rr)» e670] 
stf r7,*ar6 
| | stf £35 *>~ar2 
Fe a ee a me me a ee THIRD TO LAST-2 STAGE 
ldi @fg2,irl 
subi 1,ir0,ar5 
ldi 1,ar6 
ldi @sintab,ar7 
ldi 0O,ar4 
ldi @input, ard 
stufe ldi ar0Q,ar2 
addi ir0,ar0,ar3 
ldi ar3,arl 
lsh 1,ar6 
lsh -2,ar5 
lsh 1,ar5 
lsh -1,1ir0 
lsh ~l,irl 
addi A ga ls ot 2 
ldf *arlt+t+, r6 
| | ld ar7,r7 ; r7 = COS 
gruppe 
* fi11 pipeline 
* arO0 = upper real butterfly 
* arl = lower real butterfly 
* ar2 = upper real butterfly 
* ar3 = lower real butterfly 
* the imaginary part has to follow 
ldf *++ar7,r6 
mpyf *arl--,r6,r1 
| | addf *++ar4,r0,x3 
mpyf *arl,r7,x0 
mpyf *arl+t+, *ar7--, x0 
1 | addf r0O,rl,x3 
rpthd bflyl | 
| spyf *karl++,r7,r1 
[| subf r3,*ar0,r2 
add *ar0++,r3,4r5 
| | : st£ r2, *ar3t+t+ 
| ldi ar5,rc 
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Example 12-41. Faster Version Complex, Radix-2 DIT FFT (Continued) 
* FIRST BUTTERFLY-TYPE: 
is ) 
* TR = BR * COS + BI * SIN 
* TI = BR * SIN - BI * COS 
* AR’ = AR + TR 
* AI’ = AI - TI 
* BR’ = AR - TR 
* BI’ = AI + TI 
* loop bflyl | 
mpy *tarl,xr6,r5 ; r5 = BI * SIN, (AR’ = 4x5) 
| | StL r5,*ar2t++ 
subf rl, 7r0;,r2 ; (r2 = TI = rO0 - ri) 
mpyf *arl,xr7,xr0 ; rO = BR * COS , 
; (r3 = AI + TI) 
| | addf r2,*axr0,xr3 
subf r2, *arO+t+, r4 ; (v4 = AI - TI , BI’ = x3) 
| | stf r3, *ar3++ 
addf r0,r5,xr3_ ; v3 = TR = r0 + £5 
mpyf *arl++,r6,xr0 ; xO = BR * SIN ,. 
: ; v2 = AR - TR 
| | subf r3, *ar0,r2 | 
mpyf *arl++,xr7,r1 ; rl = BI * COS , (AI’ = r4) 
| | stf r4, *ar2++ 
bflyl addf *ar0++,7r3,xr5 ; r5 = AR + TR, BR’ = r2 
| | stf r2, *ar3t+t+ 
* switch over to next group 
subf Pi, 207,72 ; r2 = TI = r0 - ri 
addf r2,*ar0,xr3 ; r3 = AI + TI , AR’ = r5 
| | stf r5, *ar2++ | 
subf r2,*arO++(irl),r4 ; r4 = AI - TI, BI’ = £3 
| | stf r3, Xar3++ (irl) 
nop *arl++(irl) ; address update 
mpyf *arl--,r7,r1 ; rl = BI * COS , AI’ = r4 
| | stf r4, *ar2++ (irl) 
mpyf *arl,r6,xr0 ; rO = BR * SIN 
mpyf *arlt++, *ar7++, r0 ; x3 = TR = rl - r0 , r0O = BR 
* COS 
| | subf r0,r1,xr3 
rptbd bfly2 ; Setup for loop bfly2 
mpyf *arl++,r6, rl ; rl = BI * SIN , 
; r2 = AR - TR 
| | subf r3,*ar0,r2 
addf *arOt++,xr3,xr5 ; r5 = AR + TR, BR’ = r2 
| | stf r2, X*ar3++ 
ldi ar5,rc 
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Example 12-41. Faster Version Complex, Radix-2 DIT FFT (Continued) 


* SECOND BUTTERFLY-TYPE: 

* 

* TR = BI * COS - BR * SIN 
* TI = BI * SIN + BR * COS 
* AR’ = AR + TR 

* AI’ = AI - TI 

* BR’ = AR - TR 

* BI’ = AI + TI 


+ 
be 
O 
O 
'O 
on 
Fh 
be 
KG 
NO 


mpyf k+arl,r7,xr5 ; r5 = BI * COS , (AR’ = 4x5) 
| | stf r5, *ar2t++ 

addf rif EO, FZ ; 6 (x2 = TI = rO + xi) 

mpyf *arl,r6,r0 ; xrO = BR * SIN , 

(r3 = AI + TTI) 

| | addf r2, *ar0,x3 

sub r2, *arO0t++, r4 >; (r4 = AI - TI , BI’ = x3) 
| | stf r3, *ar3++ 

subf ©rO0, ES ir3 ; TR = r3 = £5 - x0 

mpyf *arl++,r7,xr0 ; rO = BR * COS , r2 = AR - TR 
| | subf r3, *ar0O,r2 

mpyf *karli++,r6,r1 ; rl = BI * SIN , (AI’ = r4) 
| | stf r4, *ar2++ 
bfly2 addf *ar0++,r3,4r5 ; r5 = AR + TR , BR’ = x2 
| | st£ r2, *ar3++ 
* clear pipeline 

addf ri, 60,172 ; r2 = TI = r0 + ri 

addf r2,*ar0,x3 ; r3 = AI + TI 
| | st£ r5, *ar2++ ; AR’ = x5 

cmpi ar6,ar4 

bned gruppe ; do following 3 instructions 

subf r2, *arO0++(irl),r4 ; r4 = AI - TI, BI’ = x3 
| | stf r3, *ar3++ (irl) 

ldf *++ar7,xr7 ; xr7 = COS 
| | st£ r4, *ar2++ (irl) ; AI’ = r4 

nop *karl++ (irl) ; branch here 


* end of this butterfly group 


cmpi 4,ir0 ; jump out after ld(n)-3 stage 
bnzaf stufe 

ldi @sintab,ar7 ; pointer to twiddle factor 
ldi O0,ar4 ; group counter 

ldi @input,ar0 ; upper real butterfly input 


KRKEKKKKKKKKKKKEKKEKEKKKEKKKKKKKKKKKKKKKKKKKKKKKKKKEKEKEKEKKEKKKKKKKKKKKKKKK KKK KK 


* -—----------- SECOND LAST STAGE----------------------------------------+* 
KKEKEKKKKKKKKKKEKKKKKKKKKKKKKKKEKKKKEKKKKKKKKKKKKKKKKEKKEKKKKKKKEKKKKKKKKEKKKKEKK 


ldi @input,ar0 ; upper input 

1di ar0,ar2 ; upper output 

addi ir0,ar0,arl ; lower input 

ldi arl,ar3 ; lower output 

ldi @sintp2,ar7 ; pointer to twiddle factor 
ldi 5,1x0 ; distance between two groups 
ldi @fg8m2,rc 


* fill pipeline 
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Example 12-41. Faster Version Complex, Radix-2 DIT FFT (Continued) 


* 5. to M. butterfly: 


* loop bf2end 
ld *xarT++, x07 ; r7 = COS , ((AI’ = r4)) 
| | stf r4, *ar2++ | 
ldf *kar7++, xr6 ; r6 = SIN , (BR’ = r2) 
| | stf r2, *ar3t+t+ 
mpyf *+tarl,r6,xr5 ; r5 = BI * SIN, (AR’ = £3) 
| | | stf r3, *ar2++ 
addf r1,r0,x2 ; (r2 = TI = rO + xl) 
mpyf *arl,r7,xr0 ; rO = BR * COS , 
; (r3 = AI + TI) 
| | addf r2, *ar0Q,xr3 
sub r2, *ar0++(ir0),r4 ; (v4 = AI - TI, BI’ = x3) 
| | stt r3, *ar3++ (ir0) 
addf r0,xr5,x3 ; v3 = TR = xO + £5 
mpyf *arl++,r6,xr0 ; rO = BR * SIN, r2 = AR - TR 
| | subf r3,*ar0,r2 
7 mpyf *ari+t+,r7,r1 ; rl = BI * COS , (AI’ = r4) 
| | stf r4, *ar2++ (ir0) 
addf *arO++,7r3,4r5 ; r5 = AR + TR , BR’ = r2 
| | stf r2, *ar3t+t 
mpyf *tarl,r6,xr5 ; r5 = BI * SIN , (AR’ = 4x5) 
| | stf r5, *ar2++t+ 
subf r1,r0,r2 ; (r2 = TI = r0 - rl) 
mpyf *karl,r7,xr0 ; rO = BR * COS , 
; (x3 = AI + TTI) 
| | addf r2,*ar0,x3 
subf r2, *ar0t++, r4 ; (r4 = AI - TI , BI’ = 4x3) 
| | stf r3, *ar3++ 
addf rO,7*5, x38 - r3 = TR. = 70+ 25 
mpyf *arl+t+,xr6,xr0 ; rO = BR * SIN , r2 = AR - TR 
| | subf  r3,*ar0,r2 , 
mpyf *arlt++(ir0),xr7,r1 ; rl = BI * COS , (AI’ = r4) 
| | stf r4, *ar2++ 
addf *ar0++,xr3,r3 ; r3 = AR + TR, BR’ = r2 
| | stf r2, *ar3t++ | 
mpyf *+arl1,r7,xr5 ; r5 = BI * COS , (AR! = x3) 
| | stft r3, *ar2++ 
subf ri, 710,72 ; (r2 = TI = r0 —- ri) 
mpyf *arl,r6,xr0 ; rO = BR * SIN , 
; (r3 = AI + TI) 
| | addf r2, *ar0,xr3 
sub | r2, *arO++(ir0),r4 ; (r4 = AI - TI , BI’ = 4x3) 
| | stf r3, *ar3++ (ir0) 
subf r0,r5,xr3 ; r3 = TR = r5 - x0 
mpyf *arl++,xr7,r0 ; xO = BR * COS , r2 = AR - TR 
1 | | - gubft r3, *ar0,r2 
mpy f. *arl++,r6,r1 ; rl= BI * SIN, (AI’ = r4) 
| | stf r4, *ar2++ (ir0) 
addf *arOt++,7r3,7xr5 ; r5 = AR + TR, BR’ = r2 


| | stf£ r2, *ar3t+t+ 
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Example 12-41. Faster Version Complex, Radix-2 DIT FFT (Continued) 


mpyf *tari,;xr7;,x5 ; r5 = BI * COS , (AR’ = r5) 
| | st£ r5, *ar2++ 
addf r1,r0,r2 3; z(r2 = TI = r0O + ri) 
mpyf *arl,xr6,r0 ; rO = BR * SIN , 
; xr3 = AI + TI) 
|| addf r2,*ar0,r3 
subf r2, *arO++, 74 ; (r4=AI-TI , 
; y(L) = BI’ = x3) 
| | stf r3, *ar3t++ 
subf r0,r5,x3 ; v3 = TR = r5 —- £0 
mpyf *arl++,xr7,x0 ; rO = BR * COS , 
>; xr2 = AR - TR 
| | subf r3,*ar0,r2 
bfZend mpyf *arl++(ir0),xr6,r1 ; rl = BI * SIN , 
; r3 = AR + TR 
| | addf *ar0++,xr3,4r3 
* clear pipeline 
stf r2, *ar3t++ ; BR’ = r2 , AI’ = r4 
| | stf r4, *ar2++ 
add r1,r0,r2 S eZ TL = 20 LL 
. add r2,*ar0,r3 ; r3 = AI + TI , AR’ = x3 
1 stf r3, *ar2++ 
subf r2,*ar0,r4 ; r4 = AI - TI , BI’ =. 4x3 
| | stf r3, *ar3 
stf r4, *ar2 ; AI’ = x4 


KKK KK KK KK KK KKK KK KK KKK KKK RK KK RIK KK RK KKK IKI KR RRR RIOR RIOR ROR OR RR RR a a 


*——-——--~------ LAST STAGE -------------------------------------------- * 
KKEKKRKKKKKKKK KKK KKK KK KK KK KKK IKK KKK KKK KK KKK KKKEKKEKKKKKEKKEKKKKKKRKEKKKKKKKK 


ldi @input, ar0 j 
ldi arQ,ar2 ; 
ldi @inputp2,arl ; 
ldi arl,ar3 | : 
ldi @sintp2,ar7 ; 
ldi 3,ir0 ; 
ldi @fg4m2,rc 


* £111 pipeline. 
* 1. butterfly: w%0 


addf *ar0, *arl,xr6 ; 
subf *karlt+t+, *arO++,r7 : 
addf *ar0, *arl,r4 ; 
subf *arl++(ir0),*ar0++(ir0), se 
* 2. butterfly: w*M/4 
addf *+tarl, *ar0,xr3 : 
ldf k-ar7,xr1 ; 
| | ldf *arl++,xr0 ; 
rptbd bflend ; 
subf *karl++(ird) , *arO++, r2 ; 
stf r6, *ar2++ : 
I | stf r7, *ar3+t+ : 
stft ~ r5, *ar3++ (ir0) ; 


upper input 

upper output 

lower input 

lower output 
pointer to twiddle 
factors 

group offset 


AR’ = r6 = AR + BR 
BR’ = r7 = AR —- BR 
AI’ = r4 = AI + BI 
BI’ = r5 = AI - BI 
AR’ = r3 = AR + BI 
rl = 0 (for inner loop) 
r0 = BR (for inner loop) 


Setup for loop bflend 
BR’ = r2 = AR - BI 


(AR’ = r6) 
(BR’ = r7) 
(BI’ = r5) 
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Example 12-41. Faster Version Complex, Radix-2 DIT FFT (Continued) 


* 3. to M. butterfly: 


* loop bflend 
| ldf *kar7T++,xr7 : >; r7 = COS , ((AI’ = xr4)) 
| stf r4, *ar2++ (ir0) | 
ldf *kar7++,xr6 ; r6 = SIN , (BR’ = r2) 
| | stf£ r2, *ar3++ 
mpyf *+tar1,r6,r5 ; r5 = BI*SIN , 
; (AR! = 43) 
1]  ~=6«8tt r3, *ar2+t+ 
addf r1,r0,r2 | ; (r2 = TI = r0O + rill) 
mpyf *arl,r7,xr0 ; rO = BR* COS , 
; (x3 =AI+ TI) 
| | addf r2, *ar0, x3 
subf r2, XarO++(ir0),xr4 > (24 =} AL = TL. , 
PIB Ses) 
| | stf r3, *ar3++ (ir0) 
addf r0,r5,xr3 ; v3 = TR = r0O + x5 
mpyf *arlt++,r6,r0 ; rO =BR* SIN, 
; r~r2 = AR -TR 
| | subf r3,*ar0,r2 
mpyf *arl++(ir0),xr7,r1 ; rl = BI *,COS 
; (AT’ = 144) 
| | stf r4, *ar2++ (ir0) 
| addf *kar0++,7r3,7r3 ; r3 = AR + IR, BR!’ = r2_ 
| | r2, *ar3++ 
mpyf *tarl,r/],xr5 ; r5 = BI * COS , 
; (AR’ = x3) 
| | stf r3,*ar2++ 
subf rl, v0, r2 ; (v2 = TI = r0 —- ri) 
mpyf *arl1,r6,xr0 ; rO= BR * SIN , 
; (r3 = AI + TI) 
| | addf r2,*ar0,xr3 
subf r2, *ar0++(ir0),xr4 ; (r4 = AI - TI , 
| ; BI’ = 43 
| | stf r3, *ar3++ (ir0) 
subf r0;-25,23 ,; r3 = TR = r0 - x5 
mpy£f *arl++,xr7,r0_ ; rO =BR* COS 
; x2 = AR - IR 
| | subf r3,*ar0,r2 . 
bflend mpyf *arl++(ir0),r6,r1 ; rl = BI * SIN, 
; r~3 = AR + TR 
L | addf *ar0++,7r3,4r3 
* clear pipeline 
stf£ r2, *ar3++ ; BR! = r2 , (AI’ = r4) 
| | Str’. r4, *ar2++ (ir0) 
addf r1,r0,r2 ; r2 = TI = r0 + rl 
addf r2, *ar0, x3 ; r3 = AI + TI, AR’ = x3 
|| SCE? r3, *ar2++ 
subf r2,*ar0,r4 | , v4 = AI - TI, BI’ = x3 
stf r3,*ar3 
stf r4,*ar2 | ; AI’ = r4 
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Example 12-41. Faster Version Complex, Radix-2 DIT FFT (Concluded) 


KKK KKK KK KKK KK KKK IK KR KK KKK KKK KR KKK KKK KKK KKK KEK KR KKK KKK KR KKK KK KKK KKK KK KKK KKK 
*—----------- END OF FFT --------------------------------------------- * 


RREKKKKKEKKEKKKKKKKKEKKKEKKKEKKKEKKKEKKEKKKKKKKKKK KKK KKKKKEKKKKKKKEKKEKKEKKKKKKKKKKEK 


KKEKKKEKKKKKKKKKKKKEKKKKKK KKK KKK KEK KKK KKK KKK KEK KKK KKK KEK KKK KKKKKKEKKEKKKKKKKKK 


*—---------=-- BIT REVERSAL ------------------------------------- * 
KREKKKKKKKKKKKKKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KEKKKKKKKKKKKKKKKKEKK 


ldi @fftsiz,ixro 
ldi 2,irl 
ldi @input,ar0 
ldi @output,arl 
ldi @fftsiz,re 
subi 2,26 
ldf *+tar0(1),r0 
rptb bitrv 
ldf *arQ++(ir0)b,r1 
| | stf£ r0,*+aril1 (1) 
bitrv ldf *+arQ0(1),r0 
| | Str rl, *arlt++ (irl) 
ldf *arO0++(ir0)b,r1 
| | stf r0Q, *tari1 (1) 
stf rl, aril 
end: nop 
nop 
nop 
nop 
self br self 
.end 


The ’C40 quickly executes FFT lengths up to 1024 points (complex) or 2048 
(real), covering most applications, because it can do so almost entirely in 
on-chip memory. Table 12-2 summarizes the execution time required for 
FFT lengths between 64 and 1024 points for the four algorithms in 
Example 12-37, Example 12-39, Example 12-40, and Example 12-41. 
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Table 12-2. TMS320C40 FFT Timing Benchmarks 


FFT Timing (in milliseconds) 


Complex Complex Complex Real 
Radix-2 Radix-2 Radix-4 Radix-2 
(Example 12-37) | (Example 12-41) | (Example 12-39) | (Example 12-40) 


0.09112 0.0606 0.0694 0.04 


0.2066 0.13316 = 0.09156 
0.46288 0.3058 0.36756 0.20712 
1.02636 0.69208 7 0.45988 
2.25544 1.54516 1.82924 1.01984 


12.4.5 Lattice Filters 
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The lattice form is an alternative way of implementing digital filters; it has 
found applications in speech processing, spectral estimation, and other 
areas. In this discussion, the notation and terminology from speech pro- 
cessing applications are used. 


If H(z) is the transfer function of a digital filter that has only poles, A(z) = 
1/H(z) will be a filter having only zeros, and it will be called the inverse filter. 
The inverse lattice filter is shown in Figure 12—5. These equations describe 


the filter in mathematical terms: 


f(i,n) = f(i-1,n) + k(i) b(i-1,n-1) 
b(i,n) = b(i-1,n-1) + k(i) f(i-1,n) 


Initial conditions: 
f(0,n) = b(0,n) = x(n) 
Final conditions: 


y(n) = f(p,n). 


In the above equation, f(i,n) is the forward error, b(i,n) is the backward error, 
k(i) is the i-h reflection coefficient, x(n) is the input, and y(n) is the output 
signal. The order of the filter (i.e., the number of stages) is p. In the linear 
predictive coding (LPC) method of speech processing, the inverse lattice 
filter is used during analysis, and the (forward) lattice filter is used during 
speech synthesis. 
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Figure 12-5. — Structure of the Inverse Lattice Filter 
f(p,n) = y(n) 


x(n) = f(0, n) ; 


b(p-1, n) 


Figure 12-6 shows the data memory organization of the inverse lattice filter 


on the ’C40. 
Figure 12-6. Data Memory Organization for Inverse Lattice Filters 
reflection backward 
| coefficients propagation terms 
® 6 
@ @ 
® 


® 
high [py bip—t, n=) 
address 
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Example 12-42. Inverse Lattice Filter 


SUBROUTINE LATINV 


TYPICAL CALLING SEQUENCE: 


R2 
LATINV 
ARO 
AR1 

RC 


ARGUMENT ASSIGNMENTS: 
ARGUMENT| FUNCTION 
i ea Gio anmme esas ‘eseia Gomi canst S celeeenieatatnentaaadianamiammdanmtenstanatasntaandeatienmtetadteateitessttentesadenatemmennteemieendtadaatemdenadaadenmammtantententendtaakenteatenteashaatentantenatentenmaadaetenmnent 


£(0,n) = x(n) 


TITLE INVERSE LATTICE FILTER 


LATINV == LATTICE FILTER (LPC INVERSE FILTER — ANALYSIS) 


| . 
| ADDRESS OF FILTER COEFFICIENTS (k(1)) 

| ADDRESS OF BACKWARD PROPAGATION: VALUES (b(0,n-1) ) 
| 


RG =p 2 


REGISTERS USED AS INPUT: R2, ARO, AR1, RC 
REGISTERS MODIFIED: RO, R1, R2, R3, RS, RE, RC, ARO, ARI 
REGISTER CONTAINING RESULT: R2 (f(p,n)) 


‘PROGRAM SIZE: 11 WORDS 


EXECUTION CYCLES: 5 + 3p 


.global LATINV 


* 

* 

* 

* 

* 

* 

* 

* 

* load 
LAJU 

ae load 

* load 

* load 

* 

* 

* 

* 

* 

* R2 

* ARQ 

x AR1 

x RC 

* 

* 

* 

* 

* 

ne 

* 

x : 

* 

* 

* i= i1 


LATINV RPTBD 
MPYF3 


LDF | 
MPYF3 
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LOOP 
*ARO, *AR1, RO 


R2,R3 
*ARO++(1),R2,R1 


we Ze Ve We We We 


Setup the delayed repeat 


block loop 


k(l). * b(O;n=1) =>-RO 


Assume £(0,n) -> R2. 
Put b(0,n) = £(0,n) -> R3. 
k(1) * £(0,n) -> RI 
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* 2 <= i<=p (Repeat block loop start here) 
* 
MPYF3 *ARO, *++AR1(1),RO ¢ k(i) * b(i-1,n-1) -> RO 
1 | ADDF3 R2,R0,R2 - ; £(i-1-1,n) + k(i-1) *b(i-1-1,n-1) 


; = £(i-1,n) -> R2 


; b(i-1-1,n-1) + k(i-1) *f£(i-1-1,n) 


ADDF3 *-AR1 (1),R1,R3 ; = b(i-1,n) -> R3 
| | STF R3, *-AR1 (1) » b(t =l=-1yn): => bi 1=1, n=2) 
i | 
LOO MPYE'3 *ARO++ (1) ,R2,R1 i kL) * £(i-1yn) => -R1 
* 
. I= P + 1 (CLEANUP) 
* 
BUD Ril ; Delayed return 
ADDF3 R2,R0,R2 ; £(p-1,n) + k(p)*b(p-1,n-1) 
; = £(p,n) -> R2 
* 
ADDF3 *AR1,R1,R3 ; se a 1,n-1) + k(p)*f(p-1,n) 
| | STF R3, *AR1 : 5 (p= yn) -> b(p-1,n-1) 
NOP 
* 
x end 
* 
-end 


The structure of the forward lattice filter, shown in Figure 12—7, is similar to 
that of the inverse filter (also shown in the figure). These corresponding 
equations describe the lattice filter: 


f(i—1,n) = f(t,n) — k(i) b(i-1,n—1) 
b(i,n) = b(i-1,n—-1) + k(i) {(i-1,n) 


Initial conditions: 

f(p,n) = x(n), b(,n-1) =O = fori=1,...., p 
Final conditions: 
y(n) = f(0,n). 


The data memory organization is identical to that of the inverse filter, as 
shown in Figure 12-6. Example 12—43 shows the implementation of the lat- | 
tice filter on the ’C40. 
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Example 12-43. Lattice Filter 


* 


* 
* 
* 
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+ + + FF HE HF HF HF HF HF HH HF HF HF OF 


TITLE LATTICE FILTER 


SUBROUTINE LATICE 


LAJU LATICE 
LOAD ARO 
LOAD ARI 
LOA - RC 


ARGUMENT ASSIGNMENTS: 
ARGUMENT | 


RC 


FUNCTION 


F(P,N) = E(N) = EXCITATION 

ADDRESS OF FILTER COEFFICIENTS (K(P)) 
ADDRESS OF BACKWARD PROPAGATION 
VALUES (B(P-1,N-1) ) 


RC =P - 2 


REGISTERS USED AS INPUT: R2, ARO, AR1, RC 


REGISTERS MODIFIED: RO, Rl, R2, R3, RS, RE, RC, ARO, AR1 


REGISTER CONTAINING RESULT: R2 (f(0,n)) 


PROGRAM SIZE: 


13 WORDS 


Concluded on next page 
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Example 12-43. Lattice Filter (Concluded) 


x EXECUTION CYCLES: 3 + 5P 
* 


-global LATICE 
* 
* 


LATICE RPTBD LOOP Setup the delayed repeat 


; block loop 
MPYF3 *ARO, *AR1, RO ; K(P) * B(P-1,N-1) -> RO 
SUBF3 RO,R2,R2 ; Assume F(P,N) -> R2 
NOP ; F(P,N)-K(P) *B(P-1,N-1) 
; = F(P-1,N) -> R2 


+ 


2 <= 1I<= P (Repeat block loop start here) 


MPYF3 *ARO,R2,R1 ; K(I) * F(I-1,N) -> R1 
MPYF3 * — -ARO(1),*-AR1(1),RO; K(I-1) * 
; B(I-1-1,N-1) -> RO 
ADDF3 *AR1 -— -—(1),R1,R3 ; B(I-1,N-1) + K(I)*F(I-1,N) 
* ; = B(I,N) -> R3 
STF R3, *+AR1 (2) ; B(I,N) -> B(I,N-1) 
LOOP SUBF3 RO, R2,R2 ; EF(I-1,N)-K (1-1) 
; *B(I-1-1,N-1) 
te ; = F(I-1-1,N) -> R2. 
* 
* I = 1 (CLEANUP) 
* 
BUD R11 ; Delayed return 
MP YF *ARO,R2,R1 ; K(1) * F(O,N) -> R1 
ADDF3 *AR1,R1,R3 ; B(O,N-1) + K(1) *F(0,N) 
* ; = B(1,N) -> R3 
STF R3, *+AR1 (1) ; B(1,N) -> B(1,N-1) 
| | STF R2, *AR1 ; F(O,N) -> B(0,N-1) 
* 
* end 
* 
end 
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12.5 Programming Tips 


Programming style is highly personal and reflects each individual’s prefer- 
ences and experiences. The purpose of this section is not to impose any 
particular style. Instead, it emphasizes some of the features of the ’C-40 that 
can help in producing faster and/or shorter programs. The tips cover both 
C compiler and assembly language programming. 


12.5.1 C-Callable Routines 


12-94 


The ’C40 was designed with a large register file, software stack, and large 
memory space in order to implement a high-level language (HLL) compiler 
easily. The first such implementation supplied is a C compiler. Use of the C 
compiler increases the transportability of applications that have been tested 
on large, general-purpose computers and decreases their porting time. 


To use the compiler efficiently, complete the following steps: 

1) Write the application in the high-level language. 

2) Debug the program. 

3) Estimate if it runs in realtime. 

4) If it doesn’t, identify places where most of the execution time is spent. 


5) Optimize these areas by writing assembly language routines that imple- 
ment the functions. 


6) Call the routines from the C program as C functions. 


When writing a C program, you can increase the execution speed by maxi- 
mizing the use of register variables. For more information, refer to the 
TMS320 Floating-Point DSP Optimizing C Compiler User’s Guide (litera- 
ture number SPRU034, due for release 3Q, 1991). 


Certain conventions must be observed in writing a C-callable routine. These 
conventions are outlined in the Runtime Environment chapter of the 
TMS320 Floating-Point DSP Optimizing C Compiler User’s Guide. Certain 
registers are saved by the calling function, and others need to be saved by 
the called function. The C compiler manual helps achieve a clean interface. 
The end result is the readability and natural flow of a high-level language 
combined with the efficiency and special-feature use of assembly language. 
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12.5.2 Hints for Optimizing Assembly Code 


Each program has particular requirements. Not all possible optimizations 
will make sense in every case. The suggestions presented in this section 
can be used as a checklist of available software tools. 


[1 Use delayed branches. Delayed branches execute in a single cycle; 
regular branches execute in four. The three instructions that follow the 
delayed branch are executed whether the branch is taken or not. If few- 
er than three instructions are used, use the delayed branch and append 
NOPs. Machine cycles (time) are still being saved. 


) Use delayed subroutine call and return. Regular subroutine CALL 
and RETS execute in four cycles. The delayed subroutine call can be 
achieved by using link and jump (LAU) and delayed branches with R11 
register mode (BUD R11) instructions. Both LAJ and BUD instructions 
execute in a single cycle. The rule for using LAJ instruction is the same 
as for delayed branches. 


[ Apply the repeat single/block construct. In this way, loops are 
achieved with no overhead. Nesting such constructs will not normally 
increase efficiency, so try to use the feature on the most often per- 
formed loop. The RPTBD is a single-cycle instruction, and the RPTS 
and RPTB are four-cycle instructions. The usage of RPTBD is similar 
to that of the delayed branches. Note that RPTS is not interruptible, and 
the executed instruction is not refetched for execution. This frees the 
buses for operands. | 


() Use parallel instructions. It is possible to have a multiply in parallel 
with an add (or subtract) and to have stores in parallel with any multiply 
or ALU operation. This increases the number of operations executed in 
a single cycle. For maximum efficiency, observe the addressing modes 
used in parallel instructions and arrange the data appropriately. Youcan 
have loads in parallel with any multiply or add (or subtract). Since the 
result of a multiply by one or an add of zero is the same as a load, parallel 
instructions with a data load can be implemented by substituting the 
load instruction with a multiply or an add instruction with one extra 
register containing a one or zero. 
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Maximize the use of registers. The registers are an efficient way to 
access scratch-pad memory. Extensive use of the register file facilitates 
the use of parallel instructions and helps avoid pipeline conflicts when 
you use register addressing (register addressing is described in 
subsection 5.1.1 on page 5-3). | | 


Use the cache. Use cache especially in conjunction with slow external 
memory. The cache is transparent to the user, so make sure that it is 
enabled. 


Use internal memory instead of external memory. The internal 
memory (2K x 32 bits RAM and 4K x 32 bits ROM) is considerably faster 
to access. In a single cycle, two operands can be brought from internal 
memory. You can maximize performance if you use the DMA in parallel 
with the CPU to transfer data to internal memory before you operate on 
them. 


Avoid pipeline conflicts. If there is no problem with program speed, 
ignore this suggestion. For time-critical operations, make sure that 
cycles are not missed because of conflicts. To identify conflicts, run the 
trace function on the development tools (simulator, emulators) with the 
program tracing option enabled. The tracing immediately identifies the. 
pipeline conflicts. Consult the appropriate section of this user’s guide 
foran explanation of the reason for the conflict. You can then take steps 
to correct the problem. 


The above checklist is not exhaustive, and it does not address some fea- 
tures outlined in more detail in the different sections of this manual. To learn 
how to exploit the full power of the ’C40, carefully study its architecture, 
hardware configuration, and instruction set described in this user’s guide. 
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12.6 Peripherals 


TMS320C40 peripheral modules include one analysis module, two timers, 
six direct memory access (DMA) controllers, and six high speed bi-direc- 
tional communication ports. They are designed to improve system perform- 
ance and decrease system cost without reducing the computational 
throughput of the CPU. These peripheral modules are controlled through 
memory-mapped registers located on the dedicated peripheral bus. The ex- 
amples that show how to program the timer, communication port, and DMA 
operations are presented in the following subsections. 


12.6.1 Timers 


There are two general-purpose, 32-bit timers on the 'C40 device. Both tim- 
ers are identical to and independent from each other (detailed information 
on the timers is in Section 9.10 on page 9-45). The timers are controlled by 
three registers: timer global control register, timer counter register, and tim- 
er period register. Pins TCLKO and TCLK1 of ’C40 are dedicated for timers. 
These pins can be configured as either general-purpose data I/O or timer. 


If bit O and bit 9 of the timer global control register are set to 0, the TCLKx 
pin is configured as a general-purpose data I/O pin. Timer counter and peri- 
od registers have no effect on this configuration. Bit 1 of the timer global con- 
trol register is used to configure TCLKx as an input or output pin. If TCLKx 
is configured as an output pin (bit 1 = 1), the data value in bit 2 of the timer 
global control register is shown on TCLKx. If TCLKx is configured as an in- 
put pin (bit 1 = 0), the signal on TCLKx is shown in bit 3 of the timer global 
control register. 


If bit 0 of the timer global control register is set to 1, pin TCLKx is configured 
as a timer pin. The frequency of the timer signaling is specified by the timer 
period register. However, this assumes that the timer counter register 
equals 0 ( writing 1 to bit 6 of the timer global control register will reset the 
counter register, too). If the timer counter register has a nonzero value in it, 
the first period will be different than the others. When the counter register 
is set to a value greater than the period register, the counter will count, roll 
over to 0, and continue counting to period register. Therefore, it is important 
to have correct values in the timer period and counter registers before start- 
ing the timer (writing a 1 to bit 7 of the timer global control register starts the 
timer). 
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The frequency of the timer signaling is determined by the frequency of the — 
timer input clock and the period register. The following equations are valid 
with either an internal or an external timer clock: 

f(pulse mode) = f(timer clock) / period register 

f(clock mode) = f(timer clock) / (2 x period register) 


When the period and counter register are zero, the operation of the timer 
is dependent upon the C/P mode selected. In pulse mode (C/P = 0), TSTAT 
is set and remains set. In the other words, the frequency is equal to infinite. 

In clock mode (C/P = 1), the width of the cycle is 2/f(H1), and the external 
clock is ignored. Therefore, the maximum frequency of timer clock gener- 
ated by internal clock is f(H1)/2. Example 12-44 shows how to set up the 
‘C40 timer to generate the maximum frequency clock through the TCLKx 


pin. 


Example 12-44. Maximum Frequency Timer Clock Setup 


* TITLE MAXIMUM FREQUENCY TIMER CLOCK SETUP 
* 
* THIS EXAMPLE SHOWS HOW TO SET UP TIMER TO GENERATE MAXIMUM 
* FREQUENCY TIMER CLOCK USING INTERNAL CLOCK. WHERE 
* “TIMER REGISTER” SECTION IS LOCATED FROM 808020H. 
* 
TIMO CTL REG -usect “TIMER REGISTER”, 4 
TIMO CNT REG -usect “TIMER REGISTER”, 4 
TIMO PRD REG -usect “TIMER REGISTER”, 8 

text 

LDI 0, RO 

STI RO,@TIMO PRD REG 

LDI 3C1H, RO 

STI RO, @TIMO CTL REG 

-end 


12.6.2 Communication Ports 
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In order to provide direct processor-to-processor communication, ’C40 has 
six parallel bidirection communication ports (see Chapter 8). Since these 
ports have port arbitration units to handle the ownership of the communica- 
tion port data bus between the processors, the programmer needs to con- 
centrate only on the internal operation of the communication ports. For soft- 
ware, these communication ports can be treated as 32-bit on-chip data I/O 
FIFO buffers. Processor read/write data from/to communication is simple: 
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LDI @comm_port0O input,RO ; Read data from comm. port 0 
or 
STI RO,@comm_portO output ; Write data to comm. port 1 


Ifthe CPU or DMA reads from or writes to the communication port I/O FIFO 
and the I/O FIFO is either empty (on a read) or full (on a write), the read/write 
execution will be extended until the data is available in the input FIFO for 
a read, or the space is available in the output FIFO for a write. Sometimes, 
this can be used to synchronize the devices. However, this will slow down 
the processing speed and even hang up the processor. Avoid such situa- 
tions. 


Each ’C40 communication port provides four flags to indicate the status of 
the port: 


ICRDY (input channel ready) 
= 0, the input channel is empty and not ready to be read. 
= 1, the input channel contains data and is ready to read. 


ICFULL (input channel full) 
= Q, the input channel is not full. 
= 1, the input channel is full. 


OCRDY (output channel ready) 
= 0, the output channel is full and not ready to be written. 
= 1, the output channel is not full and ready to be written. 


OCEMPTY (output channel empty) 

= 0, the output channel is not empty. 

= 1, the output channel is empty. 
These flags can be used to synchronize the CPU/DMA access to the com- 
munication port. Example 12—45 shows reading data from the communica- 
tion port eight data at a time using the CPU ICFULL interrupt. 
Example 12-46 shows writing data to a communication port one datum at 
a time using the polling method. The example shows DMA reads/writes of 
data from/to the communication port (DMA is discussed in the next subsec- 
tion, subsection 12.6.3). 
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Example 12-45. Read Data from Communication Port With CPU ICFULL Interrupt 


* 
* TITLE READ DATA FROM COMMUNICATION PORT WITH CPU 
* ICFULL INTERRUPT 
* 
* THIS EXAMPLE ASSUMES THE ICFULL 0 INTERRUPT VECTOR IS SET IN THE 
* CPU INTERRUPT VECTOR TABLE. THE EIGHT DATA ARE READ IN 
* WHENEVER THE DATA IS FULL IN COMM PORT O INPUT FIFO. 
* 
LDA @COMM PORTO CTL,AR2 ; Load comm port 0 
| ; control reg address 
LDA @COMM_ PORTO _INPUT,ARO; Load comm port 0 
; input FIFO address 
LDA @INTERNAL RAM, AR1 ; Load internal RAM address 
AND3 OF7H, *AR2,R9 ; Unhalt comm port 0 
; input channel 
STI R9, *AR2 
OR 04H, IIE ; Enable ICRDY 0 interrupt 
OR 02000H, ST ; Enable CPU global interrupt 
ICFULLO PUSH ST 
PUSH RS 
PUSH RE 
PUSH RC 
RPTBD READ ; Setup for loop READ 
LDI 6, RC ; Set repeat counter 
LDI *ARO, R1O ; Read data from comm port 0 
; input 
NOP 
READ LDI *ARO, R10 ; Read data from comm port 0 
; input 
| | STI R10, *AR1++ (1) ; Store data into internal RAM 
STI R10, *AR1++ (1) ; Store data into internal RAM 
POP RC 
POP RE 
POP RS 
POP ST 
RETI 
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Example 12-46. Write Data to Communication Port With Polling Method 


* 
* 
* 
* 
* 
* 
* 
* 


TITLE WRITE DATA TO COMMUNICATION PORT WITH POLLING METHOD 


THE BIT 8 OF COMMUNICATION PORT 0 CONTROL REGISTER WILL BE 
SET ONLY WHEN THE OUTPUT FIFO IS FULL. THIS EXAMPLE CHECKS 
THIS BIT TO MAKE SURE THERE IS SPACE AVAILABLE IN 

OUTPUT FIFO. 


LDA @COMM PORTO CTL, AR2 ; Load comm port 0 control reg 
address 

LDA @COMM PORTO OUTPUT,ARO; Load comm port 0 output 
FIFO address 


LDA @INTERNAL RAM, AR1 ; Load internal RAM address 
AND3 OEFH, *AR2,R9 ; Unhalt comm port 0 output 
channel 


STI RQ, *AR2 
— LDI 0100H,R9 
WAIT: TSTB *AR2,R9 
BZD WAIT 
WRITE COMM LDI *AR1++(1), R10 
STI R10, *ARO 


Load mask for bit 8 

Check if output FIFO is full 
If yes, check again 

Read data from internal RAM 
Store data into comm port 

O output 
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12.6.3 Direct Memory Access 


The ’C40 direct memory access (DMA) coprocessor supports six DMA 
channels (detailed information on DMA is in Chapter 9). These channels 
perform transfers to and from anywhere in the processor memory map. The 
DMA coprocessor is a self-programming device that allows data transfers 
to occur without any intervention from the CPU. It also provides a special 
split-mode to support 12 DMA channels for communication port memory 
transfer. This section contains examples of DMA programs from avery sim- 
ple single-block memory-to-memory transfer to a sophisticated memory 
transfer with autoinitialization. 
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Example 12-47 shows one way for setting up DMA channel 2 to initialize 
an array to zero. This DMA transfer is set up to have higher priority over a 
CPU operation and to generate an interrupt flag, DMA INT2, after the trans- 
fer is completed. The DMA control register is set to 03040007H (refer to 
DMA control register bit functions in Table 9—1 on page 9-8 for further mor 
mation on this setup). 


Example 12-47. Array initialization With DMA 


* 
* TITLE ARRAY INITIALIZATION WITH DMA 
* 
cad THIS EXAMPLE INITIALIZES A 128 ELEMENTS ARRAY TO ZERO. THE DMA 
* TRANSFER IS SET UP TO HAVE HIGHER PRIORITY OVER CPU OPERATION. 
x THE DMA INT2 INTERRUPT FLAG IS SET TO 1 AFTER THE TRANSFER IS. 
* COMPLETED. 
* 
.data 
DMA2 .word OOLOOOCOH ; DMA channel 2 map address 
CONTROL .word 00C40007H ; DMA register initialization data 
SOURCE .word ZERO 
SRC_IDX .word O.} 
COUNT .word 128 
DESTIN .word ARRAY 
DES _ IDX .word 1 
ZERO .word 0.0 ; Array initialization value 0.0 
.bss ARRAY, 128 
«CEXt 
START LDP @DMA2 ; Load data page pointer 
LDA @DMA2, ARO ; Point to DMA channel 2 registers 
LDI @SOURCE, RO ; Initialize DMA source register 
STL RO, *+ARO (1) 
LDI @SRC_IDX,RO ; Initialize DMA source index 
; register 
Sel RO, *+ARO (2) 
LDI @COUNT, RO ; Initialize DMA count register 
STI RO, *+ARO (3) 
LDI @DESTIN, RO ; Initialize DMA destination 
| ; register 
STIL RO, *+ARO (4) : 
LDI @DES IDX,RO ; Initialize DMA destination 
; index register 
ol. RO, *+ARO (5) 
LDI @CONTROL, RO ; Start DMA channel 2 transfer 
STI RO, *ARO | 
.end 
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The DMA transfer can be synchronized with external interrupts, communi- 
cation port ICRDY/OCRDY signals, and timer interrupts. In order to enable 
this feature, the SYNCH MODE field, bits 6-7, of the DMA control register 
must be configured to a proper value (Table 9-1 on page 9-8), and the 
corresponding bits of the DMA interrupt enable (DIE) register must be set. 
Example 12-48 sets up DMA channel 4 read synchronization with the com- 
munication port ICRDY signal. The DMA is set up to continuously transfer 
data from the communication port input register until the START field, bits 
22-23 of the DMA control register, is changed by the CPU. 


Example 12-48. DMA Transfer With Communication Port ICRDY Synchronization 


* 
* TITLE 
* 
* 
* 
* 
* 
* 
* 
* 
| -data 
DMA4 -word 


CONTROL .word 
SOURCE .word 
SRC IDX .word 
COUNT -word 


DESTIN .word 
DES IDX .word 


START LDP 


STI 
LDI 


STI 
LDI 
STI 


LDHI 
.end 


DMA TRANSFER WITH COMMUNICATION PORT ICRDY 
SYNCHRONIZATION 


O01000E0H 
00C00040H 
00100081H 
0 
0 


OO2FF800H 
a 


@DMA4 
@DMA4, ARO 
@SOURCE, RO 
RO, *+ARO (1) 
@SRC_IDX, RO 
RO, *+ARO (2) 
@COUNT, RO 
RO, *+ARO (3) 
@DESTIN, RO 
RO, *+ARO (4) 
@DES_IDX, RO 


RO, *+ARO (5) 
@CONTROL, RO 
RO, *ARO 


010H, DIE 


e 
a 


THIS EXAMPLE SETS UP DMA CHANNEL 4 TO TRANSFER DATA FROM 
COMMUNICATION PORT INPUT REGISTER TO INTERNAL RAM WITH ICRDY 
SIGNAL READ SYNCHRONIZATION. THE TRANSFER MODE OF THE DMA IS 
SET TO 00. THEREFORE THE TRANSFER WON’T STOP UNTIL THE START 
BITS OF THE DMA CONTROL REGISTER IS CHANGED. 


DMA channel 4 map address 
DMA register initialization data 


Transfer counter is set to 
largest value 


Load data page pointer 

Point to DAM channel 4 registers 
Initialize DMA source register 
Initialize DMA source index register 
Initialize DMA count register 


Initialize DMA destination register 


Initialize DMA destination index 
register 


Start DMA channel 4 transfer 


Enable ICRDY 4 read sync. 


lf external interrupt signals are used for DMA transfer synchronization, then 


pins 1OFO-3 must be configured as interrupt pins also. 12 
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The ’C40 DMA split mode is another way besides memory map address to 
transfer data from/to the communication port. When the split-mode bit of the 
DMA control register is set, the DMA is separated into primary and auxiliary 
channels. The primary channel transfers data from memory to the commu- 
nication port output register, and the auxiliary channel transfers data from 
the communication port to memory. The communication port number is se- 
lected in bits15 — 17 of the DMA control register. 


Example 12-49 shows how to set up DMA channel 1 into split mode. The 
DMA primary channel transfers data from internal RAM to communication 
port 3 using external interrupt INT2 synchronization and bit-reversed ad- 
dressing. The DMA auxiliary channel transfers data from communication 
port 3 to internal RAM using external interrupt INT3 synchronization and lin- 
ear addressing. 


Example 12-49. DMA Split-Mode Transfer With External Interrupt Synchronization 


* 
as TITLE DMA SPLIT-MODE TRANSFER WITH BEXTERNAL INTERRUPT 
SYNCHRONIZATION | 
* . 
* THIS EXAMPLE SETS UP DMA CHANNEL 1 TO SPLIT-MODE. THE PRIMARY 
* CHANNEL TRANSFERS DATA FROM INTERNAL RAM TO COMM PORT 3 OUTPUT 
* REGISTER WITH EXTERNAL INTERRUPT INT2 SYNCHRONIZATION AND BIT- 
* REVERSED ADDRESSING. THE AUXILIARY CHANNEL TRANSFERS DATA FROM 
* COMMUNICATION PORT 3 INPUT REGISTER TO INTERNAL RAM WITH 
* EXTERNAL INTERRUPT INT3 SYNCHRONIZATION AND LINEAR ADDRESSING. 
* 
.data 
DMA1 .word 001000BO0H ; DMA channel 1 map address 
CONTROL .word O3CDDOD4H ; DMA register initialization data 
SOURCE .word OO2FFCO0OH 
SRC_IDX .word 08H ; The same value as IRO for bit-reversed 
COUNT .word 8 | 
DESTIN .word OO2FF800H 
DES IDX .word 1 
AUX_CNT .word 8 
text 
STAR LDP @DMA1 ; Load data page pointer 
LDA @DMA1, ARO ; Point to DAM channel 1 registers 
LDI @SOURCE, RO ; Initialize DMA primary source register 
STI RO, *+ARO (1) 
LDI @SRC_IDX, RO ; nitialize DMA primary source index reg 
STI RO, *+ARO (2) 
LDI @COUNT, RO ; Initialize DMA primary count register 
Siz RO, *+ARO (3) 
LDI @DESTIN, RO ; Initialize DMA aux destination 


; register 
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LDI 


STI 
LDI 


STI 
LDI 
STI 


LDI 


LDI 


-end 


RO, *+ARO (4) 
@DES IDX,RO 


RO, *+ARO (5) 
@AUC_CNT, RO 


RO, *+ARO (7) 
@CONTROL, RO 
RO, *ARO 


01100H, IIF 


OAOH, DIE 


4 


c 


Initialize DMA aux destination 
index register 


Initialize DMA auxiliary count 
register 


Start DMA channel 1 transfer 


Configure INT2 and INT3 as 
interrupt pins 


; Enable INT2 read and INT3 write sync. 


An advantage of the ’C40 DMA is the autoinitialization feature. This allows 
you to set up the DMA transfer in advance and makes the DMA operation 
100 percent independent from the CPU. When the DMA is operating in auto- 
initialization mode, the link pointer and auxiliary link pointer are used to ini- 
tialize the registers that control the DMA operation. The link pointer may be 
incremented (AUTOINIT STATIC = 0 — shown in Table 9—1 on page 9-8) 
during autoinitialization or held constant (AUTOINIT STATIC = 1) during au- 
toinitialization. This option allows autoinitialization values to be stored in se- 
quential memory locations or in stream-oriented devices such as the 
on-chip communication ports or external FIFOs. When DMA SYNC MODE 
is enabled, The DMA autoinitialization operation can be configured to syn- 
chronize with the same signal too. Example 12-50 sets up DMA channel 0 
to wait for the communication portto input the initialization value. After DMA 
autoinitialization is complete, the DMA channel starts transferring data from 
the communication port input register to internal RAM. 
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Example 12-50. DMA Autoinitiaization With Communication Port ICRDY 


TITLE DMA AUTOINITIALIZATION WITH COMMUNICATION PORT ICRDY 


THIS EXAMPLE SETS UP DMA CHANNEL 0 TO WAIT FOR COMMUNICATION 
PORT TO INPUT THE INITIALIZATION VALUE. THE DMA AUTOINITIAL- 
IZATION AND TRANSFER ARE BOTH DRIVEN BY ICRDY 0 FLAG. AFTER 
DMA AUTOINIT IS COMPLETED, THE DMA CHANNEL STARTS TRANSFERRING 
DATA FROM COMM PORT INPUT REGISTER TO INTERNAL RAM WITH ICRDY 
O READ SYNCHRONIZATION. THE VALUES IN COMM PORT 0 INPUT FIFO 


+ + + eH HHH HHH HH HK HF HH HF HK OF 


SHOULD BE: 
SEQUENCE | VALUE 
Ae Cee OD sees ED SD ce cm SEED comms faa nn rr ere ee ee 
1 | 00C40047H (STOP AFTER TRANSFER COMPLETED) 
| OR 00C4054BH (REPEAT AFTER TRANSFER COMPLETED) 
2 | 00100041H 
3 | OH 
4 | 20H 
5 | OO2FF800H 
6 | 1H 
7 | 00100041H 
data | 
DMAO .word 001L000A0H ; DMA channel O map address 
DMA INIT .word 0004054BH ; DMA initialization control word. 
LINK -word 00100041H ; Comm port input register address 
DMA START .word 00C4054BH ; DMA start control word 
-text 
START LDP @DMAO ; Load data page pointer 
LDA @DMAO, ARO ; Point to DMA channel 0 registers 
LDI @DMA_INIT,RO ; Initialize DMA control regiester 
STI RO, *ARO : 
LDI @LINK, RO ; Initialize DMA link: pointer 
STI RO, *+ARO (6) 
LDI @DMA START, RO ; Start DMA channel 0 transfer 
STI RO, *ARO | | 
LDI. 01H, DIE ; Enable ICRDY O read sync. 
.end 


The DMA autoinitialization and transfer will continue executing if 
the DMA autoinitialization is still enabled. Therefore, a DMA setup like the 
one in Example 12—50 can make it possible for the DMA operation to be 
controlled by an external device through the communication port. 
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With the autoinitialization feature, the "C40 DMA can support a variety of 
DMA operations without slowing down CPU computation. A good example 
is a DMA transfer triggered by one interrupt signal. Usually, this is achieved 
by starting a DMA activity with a CPU interrupt service routine, but this uti- 
lizes CPU time . However, with the autoinitialization feature, 'C40 DMA can 
achieve this kind of setup without CPU interruption, as shown in 
Example 12-51. One method is to set up a single interrupt-driven dummy 
DMA transfer with autoinitialization. When the interrupt signal is set, the 
DMA will complete the dummy DMA transfer and start the autoinitialization 
for the desired DMA transfer. 


Example 12-51. Single-interrupt-Driven DMA Transfer 


+ + FF % HH 


TITLE SINGLE INTERRUPT-DRIVEN DMA TRANSFER 
THIS EXAMPLE SETS UP A DUMMY DMA TRANSFER FROM INTERNAL RAM 
TO THE SAME MEMORY WITH EXTERNAL INT 0 SYNCHRONIZATION AND 
AUTOINITIALIZATION FOR TRANSFERRING 64 DATA FROM LOCAL MEMORY 
TO INTERNAL RAM. AFTER THE SECOND TRANSFER IS COMPLETED, THE 
DMA IS RE-INITIALIZED TO FIRST DMA RANSFER SETUP. 
.data 
DMAS5 .word 001000F0H ; DMA channel 5 map address 
DMA INIT .word 0000004BH ; DMA initialization control word 
LINK -word DMA1 ; 1st DMA link list address 
DMA START .word 00CO0004BH ; DMA start control word 
DMA1 .word 00CO004BH ; lst dummy DMA transfer link list 
.word OO2FF800H 
.word 00000000H 
.word 0OO0000001H 
.word OO2FF800H 
.word 00000000H 
.word DMA2 
DMA2 .word 0OO0C4000BH ; The desired DMA transfer link 
.word 00400000H ; last 
.word 00000001H 
.word 00000040H 
.word OO2FF800H 
.word 00000001H 
.word DMA1 
-text 
START LDP @DMA5 ; Load data page pointer 
LDA @DMA5, ARO ; Point to DMA channel 5 registers 
LDI @DMA_INIT,RO ; Initialize DMA control register 
STi RO, *ARO : | 
LDI @LINK, RO ; Initialize DMA link pointer 
STI RO, *+ARO (6) 
LDI @DMA START, RO ; Start DMA channel 5 transfer 
STI RO, *ARO 
LDI 01H, IIF ; Configure INTO as interrupt pins 
LDHI O800H,DIE ; Enable INT O read sync. for 
; DMA channel 5 
end 
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The TMS320C40’s advanced interface design can be used to implement a 
wide variety of system configurations. Its two external buses and DMA ca- 
pability provide a flexible parallel 32-bit interface to byte- or word-wide de- 
vices; the communication ports provide a glueless interface to other ’C40s; 
and the interrupt interface, communication ports, and general-purpose digi- 
tal |/O provide communication with a multitude of peripherals. 


This chapter describes how to use the ’C40’s interfaces to connect to vari- 
ous external devices. Specific discussions include implementation of paral- 
lel interface to devices with and without wait states, parallel processing 
through the communication ports and port control logic, and system control 
function circuit design. 


Major topics discussed in this chapter are as follows: 
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13.1 System Configuration Options Overview 


The ’C40 interfaces connect to a wide variety of device types. Each of these 
interfaces is tailored to a particular family of devices. 


13.1.1 Categories of Interfaces on the TMS320C40 


The interface types on the 'C40 fall into several different categories, de- 
pending on the devices to which they are intended to be connected. Each 
interface comprises one or more signal lines, which transfer information and 
control its operation. Shown in Figure 13—1 are the signal line groupings for 
each of these interfaces. 


Figure 13-1. External Interfaces to the TMS320C40 
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Note: n= 0 for Communication Port 0, n = 1 for Communication Port 1, etc. 


Each interface is independent of the others, and different operations may 
be performed simultaneously on each interface. These pins are defined in 
more detail in Table 14-2 on page 14-5. 
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Figure 13-2. 
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The global and local buses implement the primary memory-mapped 
interfaces to the device. These interfaces allow external devices such as 
DMA controllers and other microprocessors to share resources with one or 
more ’C40’s through a common bus. 


The devices that can be interfaced to the ’C40 include memory, DMA de- 
vices, and numerous parallel and serial peripherals and I/O devices. In ad- 
dition, ’C40’s can interface directly with each other, without external logic, 
through their communication ports or their external flag pins IlIOF(0—3). 
Figure 13-2 illustrates a typical configuration of a’C-40 system with different 
types of external devices and the interfaces to which they are connected. 


Possible System Configurations 
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The above block diagram in Figure 13-2 constitutes a more or less fully ex- 
panded system. In an actual design, any subset or superset of the illustrated 
configuration may be used. 


Hardware Applications 


ba ce 


TREAT LIN O ITT ELL CELL LET LE LILLE LT LLL ALLL LL LLL LET LLLL EEL LEILA LEL LEE LE 3 ee ey eee 3 CELL 


Boot Loader Description and External ROM Interfacing 13 


13.2 Boot Loader Description and External ROM Interfacing 
13.2.1 TMS320C40 Boot Loader Description/Operation 


The boot loader provided in the on-chip ROM of the ’C40 can load and ex- 
ecute source programs that are received from a host processor, 
inexpensive ROM, or other standard memory devices. The ’C40 boot 
loader functions primarily as either a memory boot loader or a 
communication port boot loader. 

L} The memory bootloader supports user-definable byte, half-word, and 
full-word data formats, which allow the flexibility to load a source pro- 
gram from memories having widths of a byte, 16 bits, and 32 bits. The 
source programs to be loaded reside in one of six predefined memory 
locations: 0x0030 0000, 0x4000 0000, 0x6000 0000, 0x8000 0000, 
OxA000 0000, and 0xC000 0000 as listed in Table 13-1. 

(} The communication port bootloader waits for the first data input from 
one of the six communication port channels and uses that channel to 
perform the boot load. Format of the incoming data stream is similar to 
that fora memory data stream except that the source memory width is 
excluded (format is described in Table 13-2, page 13-7). 


Table 13-1 lists the pin values on IIOF(3—1), that select from which location 
the source program will be loaded. | 


Table 13-1. Boot Loader Mode Selection Using Pins IIOF(3-1) 


External Pin 


Source Program Location 


Load source program from address 0030 0000h 
Load source program from address 4000 0000h 


To [Resved 
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13.2.2 Boot Load Sequence 
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A general sequence of events in boot loading a source program is as 
follows: 


1) 


2) 


3) 


4) 


9) 


Select the boot nader mode by resetting the processor while driving 

the on-chip ROM enable pin (ROMEN) high. The status of external pins 
IIOF(3—1) indicates where to find the source program to be loaded 
(memory or communication port). These options are listed in 
Table 13-1. (Pins IIOF(3—1) are read as the IIOF flags in the CPU 
lIF register (described in Table 3-6 on page 3-13).) 


The boot loader takes the following steps to determine the source pro- 
gram’s location: 


a) If an IIF(3—1) value of 1100 to 001» (6 to 1) is found, the source 
program is loaded from the corresponding memory address shown 
in the top six lines of Table 13-1. 


b) If anIIF(3—1) value of 0005 (0) is found, the boot program is exited. 


c) Ifnone of the combinations 0005 — 1105 are found, the boot loader 
program assumes loading will be via a communication port, and it 
starts checking communication port input channels (in the order 
port 0 through port 5). If no input is found from a communication 
port, the program returns to checking the status of the IIOF(S—1) 
pins again. | 


When the source program’s data stream is found, the program is loaded 
at the address found in the fifth word of the data stream (format shown 
in Table 13-2) using the bus width specified in the first word (8, 16, or 
32 bits wide).The first five words of the source program specify its 
loading and execution criteria. Remaining words are the source 
program(s) and vector table pointers as shown in Table 13-2: 


An IACK instruction is exectued. The IACK indicates the completion of 
the boot load sequence. 


The source program is then executed (entry point is the first word of the 
first loaded program). 


The data stream with its source program(s) should be in the format shown 


in Table 13-2. The contents of words 4 through n vary for the different 
source programs loaded throughout the entire data stream. The first three 
words and the last three words are nonvariables that affect each of the 
source-program blocks. The eight least significant bits of the first word 
specify 
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Table 13-2. Structure of Source Program Data Stream 


[Word TO ™~—CCContontw 
Pod | ence width where source program resides (8, 16, or 32 bits wide) 


Value to set in the global memory interface control register (shown in Figure 7—2, 
page 7-7, and Table 7-3). 


Value to set in the local memory interface control register (shown in Figure 7-2, 
page 7-7, and Table 7-3). 


Ses ‘ Nie in.w _ nia ofthe first program'to: be e 
NNN b hex word hehe be all ze 


i hy Mig i h hina ag iy iy 
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Word of all zeroes. (Note that if sence source-program blocks were Sart word 
nabove would be the last word of the /astsource-program block. Each source-pro- 
gram block would have the format shown in words 4 through n (shaded above). 
Then this word of all zeroes follows the /ast source program block). 


IVTP value (interrupt vector table pointer, see Section 3.2 on page 3-15). | 
TVTP value (trap vector table pointer, see Section 3.2 on page 3-15). 


Memory location for IACK instruction (see IACK instruction in Chapter 11). 


x ine R i * 
itt 

KN ha i 
TL 


the memory width. If byte or half-word wide is selected, the loading se- 
quence is from LSBs to MSBs. 


Each source program ina multiple block program transfer can be loaded 
to different specified destinations. Each program block specifies its own 
block size and destination address at the beginning of the block. End the en- 
tire block program loader function by appending an all-zero word 
(0x0000 0000h) to the /ast block (only). 


The second and third last words of the source memory define the interrupt 
vector table pointer (IVTP) and the trap vector table pointer (TVTP). The last 
word of the source memory defines the memory location for the [ACK in- 
struction. Since the [ACK instruction brings down the IACK signal as data 
is read, the memory location specified in the [ACK instruction has to be in 
external memory that is available in the system in order to bring the [ACK 
signal low. Then the processor begins execution of the first code block. 
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13.2.3 Examples of External Memory Loads 


Example 13-1, Example 13-2, and Example 13-3 respectively show 
memory images for memory configured as byte wide, 16-bit wide, and 32-bit 
wide. These examples assume that: 
[1 The status of the IIOF(3—1) pins is 1105 after reset is deasserted 
(memory load from 0x030 0000h — see Table 13-1 on page 13-5). 
[} The source program resides at memory location 0x030 0000h and 
defines the following: 
™ Memory width for boot loader: 8, 16, or 32 bits 
™ Global bus memory that requires one software wait state, external 
RDY (SWW = 11), page size = 64K for both STRBO and STRB1, 
and active address range = 1G for both STRBO and STRB1. 
m@ Local memory bus that requires two software wait states (SWW = 
01), page size = 32K, and active address range = 1G for both 
STRBO and STRB1. 
m First block program with 294 words in length and destination ad- 
dress at 0x002F F840h. 
™@ Second block program with 64 words in length and destination ad- 
dress at 0x002F F800h. 
m@ = =IVTP and TVTP, which are overlapped and point to the beginning 
of the on-chip RAM. 
™@ Memory location of 0x30 0000h for IACK instruction. 


13.2.4 Communication Port Loading 
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A value of all ones on IIOF(3—1) signals that the source program is being 
transmitted via a communication port. Bringing all three of the IIOF(3—1) 
pins high also allows the pins to be used as interrupt lines without any exter- 
nal decode logic. With pins IIOF(3—1) all high at reset, the C40 determines 
which channel contains the program by polling the input level of each port. 
The input data sequence of the communication boot loader is the same as 
that of the memory boot loader except for the source memory width defini- 
tion (because the memory width is fixed on the communication port boot 


loader). 
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13.2.5 External ROM Interfacing to the TMS320C40 


When the ’C40’s ROMEN input pin is high and RESETLOC(1,0)=00o during 
reset, the memory boot loader can load programs stored in off-chip ROM 
to any valid external or internal memory in the 'C40’s memory map. 


Regardless of what width ROM is used (byte-wide, 16-, or 32-bit wide), the 
8 LSBs of the first word read of the data stream specify the memory width. 
As shown in the three data stream examples starting with Example 13-1 on 
page 13-10, the first byte for each memory width is: 


CL} 98-bit memories: O8h 
(1 16-bit memories: 0010h 
LJ 32-bit memories: 00000020h 


lf 8- or 16-bit ROMs are used, the loading sequence is from LSBs to MSBs. 

The boot loader reads the contents of 16-bit wide memories (least signifi- 

cant half word first) and packs each pair of 16-bit half words to make a 32- 

bit word before loading each word to memory. Accordingly, the boot loader 
reads the contents of byte-wide memories (least significant byte first) and 

packs each group of four bytes into a 32-bit word before loading each word 

to memory. Since the boot loader does byte packing before loading, no ex- 

ternal hardware is needed to pack the loaded bytes into a 32-bit word. For 

32-bit wide ROMs, no byte packing is necessary, because the ROM data 

width matches that of the 'C40. 


For 16-bit ROMs, the data read is expected to be in bit positions zero 
through fifteen. Thus, the half-word ROM’s data lines should be interfaced 
to’C40 data lines (L)D15—0. For byte-wide ROMs, the data read is expected 
to be in bit positions zero through seven. Hence, the byte-wide ROM’s data 
lines should be interfaced to ’C-40 data lines (L) D7-0. Eventhough the ’C-40 
does not require that unused data lines be pulled up to Vcc, it is recom- 
mended that each unused data line be pulled up through separate 22-kilohm 
resistors to 5 volts for minimum power dissipation. 
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Example 13-1. Byte- Wide Contigured Memory 


[Word | Aderess [Value [Comments 
: 


0300008h 
0300009h 


Global memory bus control word = 1D7BC9FOh 
(Described in Figure 7—2 on page 7-7.) 


Local memory bus control word = 1D739250h 
(Described in Figure 7-2 on page 7-7.) 


030000Ah 
030000Bh 
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“Note: Shaded area identifies source “program block. 


Example concluded on next page 
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Example 13-1. Byte-Wide Configured Memory (Concluded) 
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IVTP = 002FF800h 

TVTP = 002FF800h 

Memory location for [ACK instruction = 30 0000h 
se 

(This is the final word in the data stream.) 


Note: Shaded area identifies source program block. 
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Example 13-2. 16—Bits-Wide Configured Memory 


[_Adaress | Value 


1 0300000h {| 0010h | Memory width = 16 bits 


0300001h | 0000h 


0300002h | C9FOh | Global memory bus control word = 1D7BC9FOh 


0300003h | 1D7Bh 
0300004h 


Local memory bus control word = 1D739250h 
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03002DEh 
03002DFh | 002Fh 


Memory location for IACK instruction = 30 O000h 


0030h | (This is the final word in the data stream.) 


Note: Shaded areas identify source program blocks. 
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Example 13-3. 32-Bits-Wide Configured Memory 


word [ Address [Value [Comments 
0300000h | 00000020h | Memory width = 32 bits 
| 2 | 0300001h | 1D7BC9FOh | Global memory bus control word = 01D7BC9FOh 
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Note: Shaded areas identify source program blocks. 


13-13 


sdaneetnae ar oraserstanazanananesstetetovaerstetstatatatet eset 


43h Boot Loader Description and External FROM Interfacing - 
13 


sasueatanaegeieanieanaransneneeetelaens ebeseecngnanyeteee PPTL INNES secegebeegetecenpeteeyetoteetegeptecegeteeepneaene dee 


Se A kL IIS ICR III OC OO SC OIE PM ICO IOI 


13.2.6 IIOF(3—1) Pin Loading 


Figure 13-3. 


(from ’C40) IACK 


The load options are based upon the status of IIOF(3—1) as general-pur- 
pose input pins. Therefore, in order to select the correct boot loader mode, 
pins IIOF(3—1) must be kept at a constant valid status value for acertain time 
period (values listed in Table 13-1 on page 13-5). See the ’C40 boot load- 
er program for detailed information — Figure 13—4 starting on page 13-14. 


After the boot load is complete, the [ACK signal is brought down for one 
cycle. Figure 13-3 shows an example circuit that generates the IIOF(3—1) 
signals for boot load selection and also allows incoming external interrupts 
during normal mode of operation. In this example, after reset, the IIOF pins 
stay low until the [ACK signal is received. 


Circuit for Generation of a Low IIOF signal for Boot Loader Selection 


+ 5V External 


Interrupt | NOEn 
| (n = 1, 2, or 3) 


20K 748174 


TMS320C40 


13.2.7 TMS320C40 Boot Loader Source Program 


Figure 13-4. 
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Boot Loader Source Program 


KRREKKKKEKKKKKKKKKKRKKKKEKKKEKKRKEKKKKKKEKKKKKKEKRKKKKKKKKKEKKKKKRKKKKKEKEK 
x. 


x C40BOOT - TMS320C40 BOOT LOADER PROGRAM 

. (C) COPYRIGHT TEXAS INSTRUMENTS INC., 1990 
* 

*NOTE: 1. AFTER DEVICE RESET, THE PROGRAM IS CHECKING 

THE INPUT STATUS OF IIOF1-3 PINS AND COMMUNI- 
CATION PORT INPUT FLAGS TO CONFIGURE ITSELF 

WHEN ON CHIP ROM IS ENABLED (ROMEN=1). THE IIOFO 
PIN IS ASSUMED TO BE PULLED HIGH. 


+ + + H+ 


2. THE FUNCTION SELECTION OF IIOF1-3 IS LISTED AS: 


* IIOF3 IIOF2 IIOF1 FUNCTION 


= 1 Ee Oe Memory boot loader from 00300000H 
i see Memory boot loader from 40000000H 


* af Memory boot loader from 60000000H 


+ 
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Figure 13-4. Boot Loader Source — agri 
* Memory boot loader from 80000000H 


ea ee Memory boot loader from A0000000H 
PO Memory boot loader from COQQQQ000H 


Communication Port boot loader 


* 


+ 


+ 
PrP} O}; OF Of © 


THE PROGRAM ASSUMES THE COMMUNICATION PORT BOOT 
LOADER IS THE DEFAULT FUNCTION. IF NO OTHER 
FUNCTION IS SELECTED, THE PROGRAM STARTS CHECKING 
THE COMMUNICATION PORT INPUT CHANNELS. IF THERE IS 
NO INPUT FROM A COMMUNICATION PORT, THE PROGRAM 
RECHECKS THE IIOF (3-1) STATUS AGAIN. 


. MEMORY BOOT LOADER LOADS WORD, HALF-WORD, OR BYTE 
WIDE PROGRAM TO DIFFERENT SPECIFIED LOCATIONS. THE 8 
LSBs OF THE FIRST MEMORY SPECIFIES THE MEMORY WIDTH. 
IF THE HALF-WORD OR BYTE WIDE PROGRAM IS SELECTED, 

THE LSBs ARE LOADED FIRST AND THEN THE MSBs. THE NEXT 
2 WORDS CONTAIN THE CONTROL WORD FOR THE GLOBAL AND 
LOCAL MEMORY INTERFACE CONTROL REGISTERS. NEXT COME 
THE PROGRAM BLOCKS. THE FIRST TWO WORDS OF EACH 
PROGRAM BLOCK CONTAIN THE BLOCK SIZE AND DESTINATION 
ADDRESS WHERE THE PROGRAM IS TO BE LOADED. WHEN THE 
ZERO BLOCK SIZE IS READ, THE PROGRAM BLOCK LOADING 

IS TERMINATED. THE NEXT TWO WORDS ARE THE © 

INITIAL VALUES FOR THE IVTP AND TVTP REGISTERS. 

AFTER THE BOOT LOADING IS COMPLETED, THE IACK SIG- 
NAL WILL BE SENT OUT ACCORDING TO THE LAST WORD OF THE 
SOURCE MEMORY, AND THE PROGRAM COUNTER WILL 

BRANCH TO THE STARTING ADDRESS OF THE FIRST 

PROGRAM BLOCK. 


WW 


A 


. IF THE IIOF (3-1) ARE SETUP FOR COMMUNICATION PORT 
BOOTLOADER, THE PROCESSOR WILL WAIT FOR THE FIRST 
INPUT FROM AN INPUT COMMUNICATION CHANNEL AND USE 
THAT CHANNEL TO PERFORM THE DOWNLOAD. THE BEGIN- 
NING TWO WORDS SHOULD CONTAIN THE GLOBAL AND LOCAL 
BUS CONTROL WORDS. SIMILAR TO THE MEMORY LOADER, 
PROGRAM CAN BE LOADED INTO DIFFERENT MEMORY 
BLOCKS. FIRST TWO WORD OF EACH PROGRAM BLOCK CON- 
TAIN BLOCK SIZE AND MEMORY ADDRESS TO BE LOADED 
INTO. WHEN THE ZERO BLOCK SIZE IS READ, THE PRO- 
GRAM BLOCK LOADING IS TERMINATED. IN OTHER WORDS, 
IN ORDER TO TERMINATE THE PROGRAM BLOCK LOADING, A 
ZERO HAS TO BE ADDED AT THE END OF PROGRAM BLOCK. 
THE FOLLOWING TWO WORDS ARE THE INITIAL VALUES FOR 
THE IVTP AND TVTP REGISTERS. AFTER THE BOOT LOAD- 
ING IS COMPLETED, THE IACK SIGNAL WILL BE SENT OUT 
ACCORDING TO THE LAST WORD OF THE SOURCE MEMORY, 
AND THE PROGRAM COUNTER WILL BRANCH TO THE START— 
ING ADDRESS OF THE FIRST PROGRAM BLOCK. 


+ ++ ++ FF FH HHH HF HF HF HFA AH HK HF HK HK HH HHH HK HHH HF HF HHH EHH HH HH HH HF HK 
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Figure 13-4. Boot Loader Source Program (Continued) 
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-page 
KREKEKEKKKKEKKKKKEKRK KEKE KKK KKK KER KKK KKK KR KKKEKKKKKKKEKKKKEKKKKKKKKKKKK 


* RESET VECTOR * 
KRRKKEKKKEKKKEKKEKKKKKEKEKRKEKKEKKKEKR RK KKKE KERR KR KKKKEKKEKKKKKRKKKKKEKKKKEKK 


sect “vectors” 


RESET .word START ; On hardware RESET go to START 
KKEKKKKKKKEKKKKKKK KEK KKK KKK KKK KKK KK KK KKK KKK KKKRKRKEKKKKKKKKKKKKSK 
* TMS320C40 PROCESSOR BOOT LOADER * 
KRREKEKKEKEKKKKEKEKKKKKKEKKKKKEKKKKEKRKKKKKEKKEKKEKKEKKEKKKEKKKKKEKKKEKKKKKKEKK 
etext 
START: CMPI 04440H,IIF ; Test IIOFO pin conditiom 
BEQ LIFETEST : If low, execute life test 
LDHI 0010H, ARO ; Load peripheral mem. map start 


addr 100000H 
Initialize stack pointer SP to 
internal RAM address 2FFFFOH 
Set start address flag off 
Comm. port load subroutine 
address -> R10 


LDHI 002FH,SP 

OR OFFFOH, SP 
LDI: ‘“O,RO 

LDI COM LOAD,R10 


we Ve Ne Ve 


* 
* CHECK THE IIOF1-3 FOR THE BOOT LOADER 


* 


CHECK: LDHI 0030H,AR1 Load memory address = 00300000H 
CMPTI 04404H, IIF Test function 110 condition 
BEQ MEMORY If true, execute memory boot 

* loader 
LDHI  04000H,AR1 ; Load memory address = 40000000H 
CMP I 04044H, IIF ; Test function 101 condition 
BEQ MEMORY ; If true, execute memory boot 

* loader 
LDHI 06000H, AR1 Load memory address = 60000000H 
CMP I 04004H, IIF Test function 100 condition 
BEQ MEMORY If true, execute memory boot 

* loader 
LDHI 08000H,AR1 ; Load memory address = 80000000H 
CMPI 00444H, IIF ; Test function 011 condition 
BEQ MEMORY ; If true, execute memory boot 

* loader 
LDHI OAQOOH, AR1 ; Load memory address = AQOO0000H 
CMPI 00404H, IIF ; Test function 010 condition 
BEQ MEMORY ; If true, execute memory boot 

* loader 
LDHI 0OCOQ0O0H,ARI1 ; Load memory address = C0Q000000H 
CMP I 00044H, IIF ; Test function 001 condition 


we Ye Ve 


™e Ye Veo 


BEQ MEMORY ; If true, execute memory boot 
* loader 

CMP I 00004H, IIF ; Test function 000 condition 

BEQ RESERVED ; If true, branch to reserve 
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Figure 13-4. Boot Loader Source Program (Continued) 


* 


CHECK COMMUNICATION PORT 
ADDI 040H,ARO,AR3 


LDI 5,AR1 


CHECK CH: LSH3 -9, *AR3,R1 


* 


* 


* 


BNZ LOAD1 
ADDI 010H, AR3 
DBU AR1, CHECK CH 


B CHECK 


TEST MEMORY WORD WIDTH 


MEMORY: LDI *AR1++(1),R1 


* 


ttt * + + * + 


* 


* 


OADO: 


LDI W_WIDE, R10 


LSH 26,R1 
BN LOADO 


NOP *AR1++(1) 
LDI HH _WIDE,R10 
LSH ~=—-1,R1 

BN LOADO 

LDI B WIDE,R10 


ADDI 2,AR1 


START PROGRAM LOADING 
CALLU R10 

STI AR2, *ARO 
CALLU R10 


STI AR2, *+ARO (4) 


INPUT CHANNEL 


e 
? 


e 
tA 


ve 


te 


Point to comm. port 0 
control register addr 

Set loop counter for 
CHECK _CH loop 

Check comm port input 

If input exist, start comm 
port loader 

Point to next comm. port 
channel addr 

Check next comm. port 
channel input 

Recheck the input flags 


Load the memory word width 
Full-word size subroutine 
address -> R10 
Test bit5 of mem. width word 
If ’1’ start PGM loading 
(32 bits width) 


Jump last half word from 
mem. word 
Half-word size subroutine 
address -> R10 | 
Test bit4 of mem. width word 
If ‘1’ start PGM loading 
(16 bits width) 


Byte size subroutine address 
-> R10 

Jump last 2 bytes from 
mem. word 


Load new word according to 
mem. width 

Set global bus control 
register 

Load new word aCaonoaee. to 
mem. width 

Set local bus control 
register 7 
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Figure 13-4. 
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Boot Loader Source Program (Continued) 


LOAD2: CALLU R10 
* 


SUBI3 1,AR2,RC 


CMPI -1,RC 
BEQ IVTP LOAD 


CALLU R10 
LDI AR2, ARO 
7 LDI RO, RO 
A LDIZ AR2,R9 
LDI -1,R0 


SUBI 1,R10 


CALLU R10 


LDI 1,R0 
ADDI 1,R10 
B LOAD2 


VTP_LOAD:CALLU 


eri + + € 


LDPE 
TVTP LOAD :CALLU 
* 


LDPE 
CALLU 


IACK 
BU 


INITIALIZE IVTP AND TVTP 


R10 


AR2, IVTP 


R10 


AR2,TVTP 


R10 


*AR2 
R9 


we 


we 


we Ye We 


To eo ec ceca 


Load new word according to 
mem. width 

Set block size for 
repeat loop 

If 0 block size start PGM 


Load new word according to 
mem. width 
; Set destination address 

Test start address loaded 
flag 

Load start address if flag 
off 

Set start & dest. address 
flag on 

Sub address with loop 


Load block words according 
to mem. width 

Set dest. address flag off 

Sub address without loop 

Jump to load a new block 
when loop completed 


REGISTERS 


e 
? 


® 
c 


e 
a 


Load new word according to 


mem. width 


Load the IVTP pointer 

Load new word according to 
mem. width 

Load the TVTP pointer 

Load new word according to 
mem. width 

Send out IACK signal out 


Branch to start of program 


RIKER EAS RERRREA KEKE EAR ERS RE KAA ERE EAE AER EE REE EERE 


cas BYTE-WIDE MEMORY BOOT LOADER SUBROUTINE * 
KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KR RK KK RK KKK KKK KKK KEKE KKKKKEK 


LOOP_B: RPTB 
B WIDE: LWLO 


LOAD B 
*AR1++(1), AR2; 


*AR1++(1),AR2; 


*AR1++(1),AR2; 


*AR1++(1),AR2 ; 


B END 


AR2, *ARO++(1) ; 


a 


PGM load loop 
Load byte QO (LSB) 


Join byte 1 with byte 0 
Join byte 2 with byte 0&1 
Join byte 3 with byte O,. 1, 
oe oe address flag 
Store new word to dest. 


address 
Return from subroutine 


Hardware Applications 


Figure 13-4. 


AANA ” 


Boot Loader Source Program (Concluded) 


aParororateraPahaPetaterrerethePiernennu hut PTA, 
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KREKEKERKKKKKKKEKKKEKKEKRKEKKKKKKKKKKEKKKKKKKKKKKKKKKKKKEKKKKKKKKKEKK 


* HALF-WORD WIDE MEMORY BOOT LOADER SUBROUTINE * 
KK KKK KKK KK RK KK RK KKK KKK KA KKK KK KK KAKA K KEKE KKK KK KKK KAR KA KKK 


LOOP_H: RPTB LOAD H ; 
H WIDE: LWLO *AR1++(1),AR2; 
NOP 
LWL2  *AR1++(1),AR2; 
* 
LDI RO, RO ; 
BNN  H_END 
LOAD H STI  AR2,*ARO++(1) ; 
* 
H_END RETSU : 


PGM load loop 
Load LSB half-word 


Join MSB half-word with 
LSB half-word 
Test load address flag 


Store new word to dest. 
address 
Return from subroutine 


KKRKKKKKKKKKKREKKEKKEKRKEKKKKEKKEKKKKKKRKKEKKKEKKKKKEKKEKKKKKKKKKEKKEKKKEKSK 


ms FULL-WORD WIDE MEMORY BOOT LOADER SUBROUTINE = 


KKEKKKKKKKKKKKKKKEKKKEKKK KKK KK KKK KKK KKKKKK KKK KKKKKKKKKKEKKKKKKKK 


LOOP W  RPTB LOAD W ; 

W WIDE LDI *AR1++(1),AR2; 
LDI _—- RO, RO ; 
BNN W_END 

LOAD W STI  AR2,*ARO++(1) ; 

* 

W_END RETSU ; 


PGM load loop 
Read a new 32 bits word 
Test load address flag 


Store new word to dest. 
address 
Return from subroutine 


KRRKEKKKKRKKEKRKKREKRERKRRRKRKRKEKEKKREKKEKRKKRKKEKREKKEKREKRREKKKEKKKEKEKKKKKKKEKKEKAE 


ms COMMUNICATION PORT BOOT LOADER SUBROUTINE a 


KKKKKKKKKKKKKEKKKKKKKKKKKKKKKKKKKKKKKKKEKKKKKKKKKKKKKKKKKKK KK 


LOOP_C | RPTB LOAD C ; 
COM LOAD LSH3 -9, *AR3,R1. ; 
BZ COM LOAD ; 
LDI *+AR3(1),AR2 ; 
LDI RO, RO ; 
BNN C_END 
LOAD C STI AR2, *ARO++(1) : 
* 
C_END RETSU ; 
RESERVED: 
end 


PGM load loop 

Check comm port input 
Wait for comm port input 
Read a new 32 bits word 
Test load address flag 


Store new word to dest. 


address | 
Return from subroutine 
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13.3 Global and Local Bus Interface 


The ’C40 uses the global and local buses to access the majority of its 
memory-mapped locations. Since these two memory interfaces are identi- 
cal in every way, except for their positions in the memory map, each exam- 
ple in this memory interface section focuses on only one of the two inter- 
faces. However, all of the examples are applicable to either the local or glob- 
albus. Additionally, each of the buses features two identical, mutually exclu- 
sive sets of control signals: 


Global | Local 
Bus Bus 
STRBO LSTRBO 
STRB1 LSTRB1 
CEO LCEO 
CE1 LCE1 
RDYO LRDYO 
RDY 1 LRDY1 


Also, AE and DE put the global bus in high impedance, and LAE and LDE 
put the local bus in high impedance. 


Although both the global and the local buses can interface to a wide variety 
of devices, the devices most commonly interfaced are memories. There- 
fore, memory interface examples are used in this section. 


13.3.1 Zero Wait-State Interface to RAMs 
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For a full-speed, zero wait-state interface to any device, a 50-MHz ’C40 
(40-ns instruction cycle time) requires a read access time of 21-ns from 
address stable to data valid. For most memories, the access time from chip 
enable is the same as access time from address; thus, it is possible to use 
20-ns memories at full speed with a50-MHz ’C40. However, to properly use 


20-ns memories, there can be no long delays between the processor and 


the memories. Avoiding these delays is not always possible in practice, 
because of interconnection delays and the fact that gating is sometimes 
required for chip enable generation. In addition, if a memory device with an 
output enable is chosen, output enable must become active soon enough 
to ensure that the memory can meet the data valid timing requirements of 
the ’C40. For memories with 20-ns access times, the output enable active 
to data valid timing parameter is typically less than 10 ns. 


Currently available RAMs without output enable (OE) control lines include - 
the 1-bit wide organized RAMs and most of the 4-bit wide RAMs. Those with 
OE controls include the byte-wide and a few of the 4-bit wide RAMs. Many 
of the fastest RAMs do not provide OE control; they use chip-enable (CE) 
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controlled write cycles to ensure that data outputs do not turn on for write 
operations. In CE-controlled write cycles, the write control line (WE) goes 
low before CE goes low, and internal logic holds the outputs disabled until 
the cycle is completed. Using CE-controlled write cycles is an efficient way 
to interface fast RAMs without OE controls to the ’C40 at full speed. 


13.3.1.1 RAM Interface - Using One Local Strobe 


Figure 13—5 shows the ’C40’s local bus interfaced to the Integrated Device 
Technology™ IDT71258 20-ns 64K x 4-bit CMOS static RAMs with zero wait 
states using chip enable-controlled write cycles. These RAMs are arranged 
to implement 64K, 32-bit words located at addresses 00000h thru OFFFFh 
(internal ROM is assumed to be disabled), which are the first 64K words in 
external memory. If these 64K words of SRAM are the only memory con- 
trolled by LSTRBO, the LSTRB ACTIVE field of the local memory interface 
control register (LMICR) should be set to its minimum value 011119, allow- 
ing LSTRBO to be active only for the first 64K words of the ’C40’s memory 
space. (The memory interface control register and its various fields are 
shown in Figure 7—2 on page 7-7). In addition, because this memory is the 
only memory interfaced to LSTRBO, LSTRBO requires only one page. The 
PAGESIZE field of the LMICR should be set to 011115. Also note that in 
Figure 13-5, the LRDYO input is tied low, selecting zero wait states for all 
LSTRBO accesses on the local bus. With all of the zero-wait-state memory 
controlled by LSTRBO, LSTRB1 can be used to control accesses to slower 
read-only memory devices or other types of memory. 
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Figure 13-5. TMS320C40 Interface to Zero-Wait-State SRAM 


| @ | 
A15—A0 
cs 
WE 
ca 


1/03 —0 
IDT71258 


TMS320C40 


32 
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In this circuit implementation, no external logic is necessary to interface the 
‘C40 to the memory device. This glueless interface is possible because 
changes in LR/W are always framed by LSTRB. For typical memory de- 
vices, it is necessary to hold the device inactive (CS inactive) during 
changes in WE; this avoids undesired memory accesses while the address 
changes. The ’C40 ensures this by having LSTRB always frame changes 
in LR/W. (See Section 7.5 on page 7-17 for more information.) 


13.3.1.2 Consecutive Reads Followed by a Write Interface Timing 


Figure 13-6 shows the timing of consecutive reads followed by a write. For 
consecutive reads, LSTRBO stays active (low), and LR/W stays high as long 
as read cycles continue. The critical timing that must be met for 
back-to-back reads is the address-valid to data-valid time. The ’C40 re- 
quires zero-wait-state memories to have an address-valid to data-valid time 
of less than 21-ns. This can be explained in more detail as: 


one H1 cycle time —[(H1 low to address-valid time) + (data setup time before H1 low)] 


For mostmemory devices, this time is the same as the memory access time, 
which is tj = 20 ns. Thus, memories with access times of 25 ns or more 
cannot meet this timing. 


Memory device timing is not as critical for zero-wait-state as for nonzero- 
wait-state write cycles, because of the two H1 cycle writes of the C40. The 
extra cycle gives LSTRBO enough time to frame LR/W, preventing 
memories that go into high impedance slowly at the end of a read cycle from 
driving the bus during the subsequent write cycle. For the memory device 
used in this design (Figure 13-6), the data lines are guaranteed to be three 
stated (to = 10 ns) after CS goes inactive, which gives more than 23 ns of 
margin before the ’C40 starts driving the bus with write data. Also, the extra 
cycle with LSTRBO inactive prevents writes to random locations in memory 
while the address is changing between consecutive writes. 


For the write cycles shown in Figure 13-6 and Figure 13-7, the RAM 
requires 15 ns of write data setup before CS goes high, and this design 
provides at least 24 ns (tg). A data hold time of 0 ns (tq) is required by the 
RAM, and this design provides greater than 13 ns. Finally, the RAM’s setup 
and hold times for address (with respect to CS high) of 20 and 0 ns, 
respectively, are also met with a clear margin. 
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Figure 13-6. | Consecutive Reads Followed by a Write 
LR/Wo TN 
, 


lt, Meg! le 
Valid xX XXX) Valic 
 Datal BOOK Data)» Valid Write Data 
| F 


i 
Valic : 
LA(30-0) pang addr )X Valid Read Addr Write Address 


LD(31—0) 


Figure 13-7. | Consecutive Writes Followed by a Read 


STABO i, on J, ce Ee, Sane 


| 
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LD(31—-0) Valid Write Data Valid Write Data Valid Data 


LA(30-0) Valid Write Address 


Valid Write Address | Read Address 
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Consecutive Writes Followed by a Read Interface Timing 


Figure 13-7 shows the timing of consecutive writes followed by a read. No- 
tice that between consecutive writes, LR/W stays low, but STRBO goes inac- 
tive to frame the write cycles. Although ’C-40 zero-wait-state writes take two 
H1 cycles, internally (from the perspective of the CPU and DMA) writes ap- 
pear to take one cycle if no accesses to that interface is already in progress. 


In the read cycle following the writes in Figure 13—7, the ’C40 requires 
zero-wait-state memories to have aLSTRB active to data-valid time of less 
than 21 ns (one H1 cycle time minus (H1 low to LSTRB active time plus data 
setup time before H1 low)). For most memory devices, this time is the same 
as the memory access time, which is tj = 20 ns inthis design. Thus, amargin 
of only 1 ns exists, leaving little time for STRB gating if desired. 


RAM Interface Using Both Local Strobes 


Figure 13-8 shows the ’C40’s local bus interfaced to IDT71258 RAMS — 
20-ns 64K x 4-bit CMOS static RAMs with zero wait states using CS con- 
trolled write cycles. These RAMs are arranged to allow 128K 32-bit words 
of local memory, which is implemented as two 64K x 32-bit banks. One bank 
is controlled by each of the two sets of control signals on the local bus. To 
map these memory devices properly in the ’C40’s memory space, you must 
use the local memory interface control register (LMICR) to define which part 
of the local bus’s memory space is mapped to each of the two strobes. In 
this implementation with internal ROM disabled, LSTRBO is mapped to the 
‘first 64K words of the local space — addresses Oh through OFFFFh, and 
LSTRB1 is mapped to the rest of the local space — addresses 10000h 
through 7FFF FFFFh. For this memory configuration, the LSTRB ACTIVE 


_ field of the local memory interface control register (LMICR) should be set 


to011115. Also, each LSTRB requires only one page. The PAGESIZE field 
of the LMICR should be set to 011115. Also, note that in Figure 13-8, the 
LRDY inputs are tied low, selecting zero wait states for all accesses on the 
local bus. 


Hence, through the use of the ’C40’s four strobes (two each on the local and 
global buses), four different banks of memory can be decoded. In addition, 
the address decoding can be changed under program control by changing 
the LSTRB active field (bits 24—28) of the LMICR or the global memory inter- 
face control register (GMICR). If more than four banks of memory must be 
decoded or if the chosen memory device cannot meet the read cycle timing 
requirements for the ’C40 at zero wait states, page switching (discussed in 
subsection 13.4.6 on page 13-32) should be used to add an extra cycle to 
read accesses outside the current bank boundary. 
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Figure 13-8. | 7MS320C40 Interface to Zero-Wait-State SRAMs, Two Strobes 
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13.4 Wait States and Ready Generation 


The use of wait states can greatly increase system flexibility and reduce 
hardware requirements over systems without wait-state capability. The 
‘C40 has the capability of generating wait states on either the global bus or 
the local bus, and both buses have independent sets of ready control logic. 
The buses’ wait-state configuration is determined by the SWW and WT CNT 
fields of the local and global bus interface control registers (see Section 7.4, 
page 7-15, for a detailed description of the wait-state options). 


This section discusses ready generation from the perspective of the global 
bus interface; however, wait-state operation on the /ocal bus is the same as 
on the global bus, so this discussion pertains equally well to both (local and 
global). Also, the local and global buses each have two sets of control sig- 
nals — R/WO, STRBO, RDYO, and R/W1, STRB1, RDY1 — with each set 
of control signals having its own ready signal, providing for more flexibility 
in support of external devices with different speeds. Since both strobes’ 
ready signals share the same electrical characteristics, the following discus- 
sion focuses on one of the global bus’s set of control signals. 


Wait states are generated on the basis of: 

(1 the internal wait-state generator, 

the external ready inputs (RDYO or RDY1), or 

O the logical AND or OR of the two (discussed in Section 7.4, page 7-15). 


When enabled, internally generated wait states affect all external cycles, 
regardless of the address accessed. If different numbers of wait states are 
required for various external devices, the external RDY input may be used 
to customize wait-state generation to specific system requirements. 


If either the logical OR or electrical AND (since the signals are true low) of 
the external and wait-count ready signals is selected, the earlier of the two 
signals will generate a ready condition and allow the cycle to be completed. 
It is not required that both signals be present. 


a aR RR aca | 


Note: STRBx SWW Field Values 


The STRBx SWW fields of the memory-interface control register are shown 
in Figure 7—2 (page 7-7) and explained in Table 7—7 (page 7-16). 


| EE eae ee | 
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13.4.1 ORing of the Ready Signals (STRBx SWW = 10) 


The OR of the two ready signals can be used to implement wait states for 

devices that require a greater number of wait states than are implemented 

with internal logic (up to seven). This feature is useful, for example, if a sys- 
tem contains some fast and some slow devices. In this case: 

[1 Fast devices can generate ready externally with a minimum of logic. 
When fast devices are accessed, the external hardware responds 
promptly with ready, which terminates the cycle. 

1) Slow devices can use the internal wait counter for larger numbers of 
wait states. When slow devices are accessed, the external hardware 
does not respond, and the cycle is appropriately terminated after the in- 
ternal wait count. 


The OR of the two ready signals may also be used if conditions occur that 
require termination of bus cycles before the number of wait states implem- 
ented with external logic. In this case, a shorter wait countis specified inter- 
nally than the number of wait states implemented with the external ready 
logic, and the bus cycle is terminated after the wait count. This feature may 
also be used as a safeguard against inadvertent accesses to nonexistent 
memory that would never respond with ready and would therefore lock up 
the ’C40. 


If the OR of the two ready signals is used, however, and the internal wait- 

state count is less than the number of wait states implemented externally, 

the external ready generation logic must have the ability to reset its 

sequencing to allow a new cycle to begin immediately following the end of 

the internal wait count. This requires that, under these conditions: 

[1 consecutive cycles must be from independently decoded areas of 
memory (or from different pages in memory), and 

(3 the external ready generation logic must be capable of restarting its 
sequence as soon as a new cycle begins. 


Otherwise, the external ready generation logic may lose synchronization 
with bus cycles and therefore generate improperly timed wait states. 


13.4.2 ANDing of the Ready Signals (STRBx SWW = 11) 
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If the logical AND (electrical OR) of the wait count and external ready signals 
is selected, the later of the two signals will control the internal ready signal, 
but both signals must occur. Accordingly, external ready control must be im- 
plemented for each wait-state device, and the wait count ready signal must 
be enabled. | 


This feature is useful if there are devices in a system that are equipped to. 
provide a ready signal but cannot respond quickly enough to meet the 
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’C40’s timing requirements. In particular, if these devices normally indicate 
a ready condition and, when accessed, respond with a wait until they be- 
come ready, the logical AND of the two ready signals can be used to save 
hardware in the system. In this case, the internal wait counter can provide 
wait states initially, and then the external ready can provide wait states after 
the external device has had time to send a not-ready indication. The internal 
wait counter then remains ready until the external device also becomes 
ready, which terminates the cycle. 


Additionally, the AND of the two ready signals may be used for extending 
the number of wait states for devices that already have external ready logic 
implemented but require additional wait states under certain unique circum- 
stances. 


13.4.3 External Ready Generation 


Inthe implementation of external ready generation hardware, the particular 
technique employed depends heavily on the specific characteristics of the 
system. The optimum approach to ready generation varies, depending on 
the relative number of wait-state and nonwait-state devices in the system 
and onthe maximum number of wait states required for any one device. The 
approaches discussed here are intended to be general enough for most 
applications and are easily modifiable to comprehend many different 
system configurations. 


In general, ready generation involves the following three functions: 

1) Segmentation of the address space in some fashion to distinguish fast 
and slow devices. 

2) Generation of properly timed ready indications. 

3) Logical ORing of all the separate ready timing signals together to 
connect to the physical ready input. 


Segmentation of the address space is required to obtain a unique indication 

of each particular area within the address space that requires wait states. 
This segmentation is commonly implemented in a system in the form of 
chip-select generation. Chip-select signals may be used to initiate wait 
states in many cases; however, occasionally, chip-select decoding 
considerations may provide signals that will not allow ready input timing 
requirements to be met. In this case, coarse address space segmentation 
may be made on the basis of asmall number of address lines, where simpler 
gating allows signals to be generated more quickly. In either case, the signal 
indicating that a particular area of memory is being addressed is normally 
used to initiate the ready or wait-state signal. | 


Once the region of address space being accessed has been established, 
a timing circuit of some sort is normally used to provide a ready indication 
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to the processor at the appropriate point in the cycle to Satisfy each device’s 


_ unique requirements. 


Finally, since indications of ready status from multiple devices are typically 
present, the signals are logically ORed by using a single gate to drive the 
RDY input. 


13.4.4 Ready Control Logic 
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One of two basic approaches may be taken in the implementation of ready 
control logic, depending upon the state of the ready input between ac- 
cesses. If RDY is low between accesses, the processor is always ready un- 
less await state is required; if RDY is high between accesses, the processor 
will always enter a wait state unless a ready indication is generated. 


lf RDY is low between accesses, control of devices that are zero-wait- 
state at full speed is straightforward; no action is necessary, because ready 
is always active unless otherwise required. Devices requiring wait states, 
however, must drive ready high fast enough to meet the input timing require- 
ments. Then, after an appropriate delay, a ready indication must be gener- 
ated. This can be quite difficult in many circumstances because wait-state 
devices are inherently slow and often require complex select decoding. 


lf RDY is high between accesses, zero-wait-state devices, which tend to 
be inherently fast, can usually respond immediately with a ready indication. 
Wait-state devices may simply delay their select signals appropriately to 
generate aready. Typically, this approach results in the most efficient imple- 
mentation of ready control logic. Figure 13-9 shows a circuit of this type, 


which can be used to generate 0, 1, or 2 wait states for multiple devices in 


a system. 
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Figure 13-9. 


TMS320C40 
Address Bus 
Bits for Device 
Selection 


from ’C40 


Logic for Generation of 0, 1, or 2 Wait States for Multiple Devices 


’P16R4 


RDYO (to C40) 


H3 


13.4.5 Example Circuit 


Figure 13-9 shows how a single, 7-ns 16R4 programmable logic device 
(PLD) can be used to generate 0, 1, and 2 wait states for multiple devices 
that are interfaced to a TMS320C40. In this example, distinct address bits 
are used to select the different wait-state devices. Here, each of the three 
address lines input to the 16R4 corresponds to a different speed device. 
For a single 16R4 implementation, up to ten different address bits can be 
used to select different speed devices. 


The single output, 4Q, of the PLD is connected directly to the RDYO input 
of the TMS320C40 to signal the completion of a bus access when external 
wait-state generation is desired (see Section 7.4 on page 7-15 for more in- 
formation on TMS320C40 wait-state options). Since, RDYO is sampled on 
the falling of H1, the H3 output clock is used as the PLD clock input. 


Figure 13-10 shows the state machine and equation for programming the 
16R4 PLD ready logic. The PLD language shown in this figure is ABEL. 
STRBO is an input into the PLD that indicates that a valid TMS320C40 bus 
cycle is occurring. RESET can also be used to bring the state machine back 
to the idle state. 


Notice that the RDYO output of the PLDis not registered. An asynchronous 
RDY0O signal is necessary to generate a ready signal for zero-wait-state de- 


vices. When a zero-wait-state device is selected (ahii highin Figure 13-10 


and STRBO is low, the PLD asserts RDYO low within 7 ns. Hence, RDYO 
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goes active fast enough to satisfy the 20-ns setup time of RDYO low before 
H1 low. 7 


For generation of RDYO for one and two wait states, the device select ad- 


dress bits and STRBO are delayed one and two cycles, respectively, by the 


PLD before a RDYO is brought active low. The one H3-cycle delay required 
for one-wait-state device ready generation corresponds to state wait one 
in Figure 13-10 and the two H3-cycle delay required for two-wait-state de- 
vices corresponds to state wait _twoa and wait _twob. 


This 16R4 PLD-based design can be used to implement different numbers 
of wait states for multiple devices. More devices can be selected with 
TMS320C40 address lines, and a higher number of wait states can be pro- 
duced with a PLD logic. Furthermore, this approach can be used in conjunc- 
tion with the TMS320C40’s internal wait-state generator. 


13.4.6 Page Switching Techniques 
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The ’C40’s programmable page switching feature can greatly ease system 
design when large amounts of memory or slow external peripheral devices 
are required. This feature can provide a time period for disabling all device 
selects that would not normally be present otherwise (refer to subsection 
7.3.2 0n page 7-13 for further information regarding page switching). During 
this interval, slow devices are allowed time to turn off before other devices 
have the opportunity to drive the data bus, thus avoiding bus contention. 


When page switching is enabled, any time a portion of the high-order ad- 
dress lines changes, as defined by the contents of the STRBO and STRB1 
PAGESIZE fields (in the global and local memory interface control regis- 
ters), the corresponding STRB and PAGE go high for one full H1 cycle. Pro- 
vided that STRB is included in chip-select decodes, this causes all devices 
selected by that STRB to be disabled during this period. The next page of 
devices is not enabled until STRB and PAGE go low again. 


lf the high-order address lines remain constant during a read cycle, the 
memory access is the same as that of a memory access without page 
switching. In addition, page switching is not required during writes, because 
these cycles exhibit an inherent one-half H1 cycle setup of address informa- 
tion before STRB goes low. Thus, when you use page switching for read/ 
write devices, a minimum of half of one H1 cycle of address setup is pro- 
vided for all accesses outside a page boundary. Therefore, large amounts 
of memory can be implemented without wait states or extra hardware re- 
quired for isolation between pages. Also, note that access time for cycles 


_ during page switching is the same as that of cycles without page switching, 


and, accordingly, full-speed accesses may still be accomplished within each 
page. 
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The circuit shown in Figure 13-10 illustrates the use of page switching with 
the Cypress Semiconductor™ CY7B185 15-ns 8K x 8 BICMOS static RAM. 
This circuit implements 32K 32-bit words of memory with full-speed zero 


wait-state accesses within each page. 


Figure 13-10. State Machine and Equation for the 16R4 PLD 
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module ready generation 

title’ ready generation logic for 0, 1 and 2 
wait state devices interfaced to 
TMS320C40’ 


c40u5 device ’P16R4’; 


“inputs 
h3 Pin 1; 


“The following are TMS320C40 address bits used to 

“select the different speed devices. More can be used if 
“necessary. In this example, a zero wait state, a one wait 
“state, and a two wait state device are decoded with these 
“three address bits 


ahil Pin 2; “when high selects zero wait state device 
ahi2 Pin 3; “when high selects one wait state device 
ahi3 Pin 4; “when high selects two wait state device 
strb0_ Pin 5; “indicates valid TMS320C40 bus cycle 
reset __ Pin 6; “reset signal from TMS320C40 

“output 

rdy0O_ Pin 12; “ready signal to TMS320C40 


one_wait Pin 14; “internal flip-flop signal for 1 wait state 
“device ready signal generation 
two _waita Pin 15; “internal flip-flop signal for first of the 


“wait states for 2 wait state devices 
two_waitb Pin 16; “internal flip-flop signal for second 
“of the two wait states for 2 wait 
state devices 


“name substitutions for test vectors 
c,H,L,X = oug £50; ne Sree 


“state bits 
outstate = [one_wait, two_waita, two_waitb]; 
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0040 
0041 
0042 
0043 
0044 
0045 
0046 
0047 
0048 
0049 
0050 
0051 
0052 
0053 
0054 
0055 
0056 
0057 
0058 
0059 
0060 
0061 
0062 
0063 
0064 
0065 


0066 
0067 
0068 
0069 
0070 
0071 
0072 
0073 
0074 
0.075 
0076 
0077 
0078 
0079 
0080 
0081 


0082 | 


0083 
0084 
0085 
0086 
0087 
0088 
0089 
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idle = “blll; 
wait_one = “*b011; 
wait_twoa = %b101; 
wait _twob = *b110; 


state_diagram outstate 


state idle: 

else idle; 
state wait_one: 

GOTO idle; 
state wait_twoa: 


else idle; 


state wait _twob: 


| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| GOTO idle; 
| 

|equations 

| 

| 'two_waitb) ; 
| 


|@page 


if (reset_) then wait _twob 


Figure 13-10. State Machine and Equation for the 16R4 PLD (Concluded) 


|“Test lst level global arbitration logic 


jtest vectors 


|({h3,ahil,ahi2,ahi3,strb0_, reset_ 


l{ c, X, X, X, X, 
l([ c, L, H, L, L, 
[[ c, X, X, X, X, 
I{ c, L, L, H, L, 
i{ c, Xx, X, X, X, 
[{ c, L, L, H, L, 
[{ c, L, L, H, L, 
[{ c, x, Xx, X, Xx, 
[{ L, 4, L, L, L, 
l{ c, H, L, L, L, 
I{ L, UL, L, L, L, 
If c, L, H, L, L, 
I{ c, X, X, X, X, 
I{ c, L, L, H, L, 
[{ c, L, L, H, L, 
I{ c, H, L, L, L, 
l{ c, X, X, X, H, 
| [ Cy, X, X, X, 4 
Jend ready generation 
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] 
] 
] 
] 
] 
] 
] 
] 
] 
] 
] 
] 
] 
] 
] 
] 
] 
] 
] 


=> 
=> 


foutstate, 
[idle, 
[wait_one, 
[idle, 
[wait _twoa, 
[idle, 
[wait_twoa, 
[wait_twob, 
[idle, 
[idle, 
{idle, 
[idle, 
[wait _one, 
[idle, 
[wait_twoa, 


[wait _twob, 


{idle, 
fidle, 
{idle, 


poo 2 oon oS Oo 0 eno FO el on oo nc Os es 2 OD 


if (reset_ & ahi2 & !strb0_) then wait_one 
else if (reset _ & ahi3 & !strb0O_) then wait_twoa 


!rdy0 = reset _ & ((ahil & !strb0_) # !one_wait # 


Ks 
he 
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Figure 13-11. 


TMS320C40 


Page Switching for the Cypress Semiconductor CY7C 185 


A5-ns, '16L8 PLD decodes lines Ai5—A13. These lines along with STRBO 
select each of the four pages in this circuit. With the PAGESIZE field of 
STRBO of the global memory interface control register set to OCh, the pages 
are selected on even 8K-word boundaries, starting at location zero in 
external memory space. 


This circuit cannot be implemented without page switching, because data 
output’s turn-on and turn-off delays cause bus conflicts, and full-speed 
accesses do not allow enough time for chip-select decoding for the four 
pages. Here, the propagation delay of the 16L8 is involved only during page 
switches, where there is sufficient time between cycles to allow new chip-se- 
lects to be decoded. | 


The timing of this circuit for read operations using page switching is shown 
in Figure 13-12. When apage switch occurs, the page address on address 
lines A30 — A13 is updated during the extra H1 cycle while STRBO is high. 
Then, after chip-select decodes have stabilized and the previously selected 
page has disabled its outputs, STRB goes low for the next read cycle. 
Further accesses occur at full speed with the normal bus timings, as long 
as another page switch is not necessary. Write cycles do not require page 
switching, because of the inherent address setup provided in their timings. 


This timing is summarized in Table 13-3. 
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Figure 13-12. Timing for Read Operations Using Bank Switching 
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Table 13-3. Page Switching Interface Timing 


Time _ 


| ty | H1 falling to address/STRB valid 
STRB to select delay | 5ns 
Memory disable from select | 8ns 
H1 falling to STRB 
STRB to select delay | 5ns 
Pe tee Memory output enable delay | 3ns | 
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13.5 Parallel Processing Interfaces 


The ’C40 communication ports and support for shared memory are the keys 
to parallel processing design flexibility. Almost any number of processors 
can be linked together in a wide variety of configurations. In this section, 
Figure 13-14 (in three parts) illustrates ‘C40 parallel processing 
configurations that are used to fulfill many signal processing system needs. 


13.5.1 Message Broadcasting From One TMS320C40 to Many 
TMS320C40’s 


Message broadcasting from one’C40 to many ’C40s requires a simple inter- 
face. The block diagram of one is shown in Figure 13-13. To simplify the 
interface, no token transferring is done. In this design, one ’C40 is the dedi- 
cated transmitter, and three ’C40s are dedicated receivers. No reset circuit- 
ry is needed because of the transmitter is communication port 0 and the re- 
ceivers are communication ports 3, 4, and 5. At reset, C40 communication 
ports. 0, 1, and 2 are output ports, and communication ports 3, 4, and 5, are 
input ports. Due to this fixed communications configuration, no token trans- 
fer is needed, allowing the CREQ and CACK pins of all processors to be indi- 
vidually pulled up to 5 volts through 22-kQ resistors. Also, the STRB pins 
of the communicating processors can be tied together along with the data 
lines CD7—0. However, if more than 5 receivers must be driven by a single 
transmitter at the ’C40s rated speed, the STRB and CD7-0 lines need to be 
buffered. Since the ‘C40 communication ports protocol is asynchronous, 
if the speed of broadcast is not critical, buffers are not needed as long as 
the number of receivers is less than 30. The CRDY signal input by the trans- 
mitter communication port is generated by ORing the RDY outputs of all of 
the receiver communication ports. The transmitter should not receive a RDY 
signal until the receiver has received all data. 


In addition, to ensure that the dedicated receiver ’C40s do nottry to arbitrate 
for the communication port bus, you should halt the output ports of the re- 
ceiver 'C40s by setting bit four of their communication port control registers 
to one. 
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Figure 13-13. Message Broadcasting by One ’C40 to Many 'C40s 


"C40 


+5V 


22 kQ 22 kQ 


"C40 


— *C40 


13.5.2 Shared Global Memory Interface With Fair Bus Arbitration 
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One of the most common multiprocessing system configurations is memory 
shared by each processor in a system. Shared memory is typically 
implemented by tying the processors’ data and address lines together. 
However, the shared memory interface must guarantee that no more than 
one processor is driving the shared bus at any one time; it must also allow 
all processors sharing the bus to have a chance to access share 

resources. | 


The ’C40 supports shared memory multiprocessing with its identical global 
and local port interfaces. Both interfaces have four status output signals, 
(L)STAT3—0, which identify what type of access is attempting to begin on 
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the bus. These signals identify whether the C40 port is idle, a DMA read is 
occurring, a STRB1 write is occuring, a LOCKed access to memory is 
pending, etc. (as listed in Table 7-2, page 7-5). These signals can be 
interpreted to issue single access or locked access bus requests to a shared 
bus arbiter. 


To support shared address control and data lines, the ’C40 provides the 
(L)CE, (L)AE, and (L)DE input signals. When disabled (made high), these 
signals three-state the control signals, address lines, and data lines, respec- 
tively, of the port. These bus enable lines are asynchronous inputs to the 
‘C40, which can quickly turn off bus drivers when another processor is 
accessing a shared resource. However, these signals asynchronously turn 
off the ’C40’s local and global buses, without memory accesses being 
suspended. To ensure that data written is seen externally and data read is 
valid, the external (L)RDY should be be used for wait-state generation in 
shared memory designs. An (L)RDY signal should not be sent to the 'C40 
until the processor has regained access to the bus (CE, AE, DE enabled) 
and has had enough time to complete its access. Hence with bus enable 
and status signals, the flexible bus interfaces of the ’C40 allow high-speed 
shared bus configurations to be implemented. 
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Figure 13-14. TMS320C40 Parallel DSP System Architectures 
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PIPELINED LINEAR ARRAY 
For convolution and correlation and other pipelined operations 
in graphics and modem applications. 
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Port Connection 
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Supports broadcasting and data searches for speech and 
image recognition applications. ~ 
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Clockwise and counterclockwise data flow. Group 
port for more I/O. Very effective for neural networks. 
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Excellent for image processing. 


w" SEOBLI]U] BUISSEDOIe [Off 


Parallel Processing Interfaces 


HEXAGONAL GRID 
6 nearest neighbor connection. — 
Useful in numerical analysis and 
image processing. 


‘C40 


Communication 
Port Connection 


3-D GRID 
For hierarchical processing such as 
image understanding and finite 
element analysis. 


Communication 
Port Connection 


4-D HYPERCUBE 
A more general-purpose structure. 


Figure concluded on next page 
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Figure 13-14. TMS320C40 Parallel DSP System Architectures (Concluded) 
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Memory interfaces support shared global 
memory and private local memory. 


Architectures utilizing shared memory 
and communication ports are possible. 


Memory interfaces also support shared 


global memory on the global bus and the A truly limitless variety of 
local bus. configurations are possi- 
ble. | 
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In this section, a’C40 shared memory example is shown. Four ’C40s share 
SRAM with their global buses tied together. A bus arbitrator implemented 
as a programmable logic device provides a fair scheme for processor ac- 
cess to the shared bus. The design shown here uses high speed parts but 
employs a fully asynchronous handshake protocol, which is still general, al- 
lowing varying speed ’C40s and processors other than ’C40s to be added 
to this bus configuration. 


13.5.3 Shared Bus Interface Overview 


Figure 13—15 and Figure 13-16 are examples of shared memory configura- 

tions. In these figures: 

CL} Four ’C40s (each as shown in Figure 13-15) have their global buses 
tied together, 

(J Each shares 128K x 32 of one-wait-state SRAM, 

[ 64Kofthe memory is controlled by R/W0; STRBO and the other 64K are 
controlled by R/W1 and STRB1. 


The memory devices are organized as 64K x 4, 35-ns SRAMs. Due to the 
'C40’s bus enable signals— AE, DE and CE —allfour’C-40s’ data, address, 
and control lines can be tied together for a shared memory configuration. 
However, since 128K words of shared memory are being implemented on 
the global bus (shown in Figure 13-15 and Figure 13-16), the common ad- 
dress lines are buffered to provide adequate drive to the 16 required 
memory devices. Also, the memories’ chip-enable lines are pulled up to 5 
volts through 22-kQ resistors to ensure that the memory devices are dis- 
abled when no ’C40 is accessing them. 


The required shared global bus interface logic consists of two levels of bus 
arbitration logic implemented as programmable logic devices (PLD). Each 
of the ’C40s has an identical first level of logic that interfaces to the shared 
second level arbiter. The first level of logic for each of the four ’C40s consists 
of one 7-ns 16R6 PLD and one 7-ns 16R4 PLD (center of Figure 13-15). 
Each first level 16R6 PLD receives status and control signals from the corre- 
sponding ’C40, determines what kind of global bus transfer the associated 
‘C40 requires, and issues a global bus request signal to the global bus con- 
troller (GBC, bottom of Figure 13-16), which, with the bus-grant time-out 
counter, implements the second level of arbitration logic. The GBC is im- 
plemented with a 7-ns 16R8 PLD, and the timeout counter is implemented 
with a 7-ns 16R4 PLD. In addition to the two GBC PLDs, a 16L8 PLD is used 
to issue write enable signals to the shared memory. 


Since typical high-speed PLDs do nothave many registered I/O pins or mul- 
tiple clock sources, each first-level 16R6 PLD uses a 16R4 PLD to synchro- 
nize some of the input and output signals, and the 16R8 GBC PLD uses ex- 
ternal flip-flops to synchronize input signals. 
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If a’C40 requires uninterrupted, multicycle global bus transfers, the first-le- 
vel PLD keeps its bus-request signal active until the uninterruptable cycles 
are complete. The bus controller performs arbitration between the ’C40s re- 
questing the shared global bus. If a’C40 is given access to the bus, the bus 
controller sends its first-level PLDs a bus grant signal. The first-level PLD 
then sends a bus enable signal to the ’C-40, which brings its bus control, ad- 
dress, and data signals out of high impedance. The first-level PLD also 
sends a BUSRDYQ signal to the ’C40 to end each read or write cycle. 


Figure 13-15. TMS320C40 Shared Memory Interface 
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_ Same configuration). _ 
2) The shared memory (shared by the four 'C40s) and global bus controller are shown 
in Figure 13-16 on the next page. 


8) The fixed/rotating priority is a programmable option at the global bus controller (GBC). 
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Figure 13-16. TMS320C40 Shared Memory and Bus Controller Interface 
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For full-speed operation, the ’'C40s run from separate, 50-MHz crystal-oscil- 
lator clock sources. For synchronization of shared bus control signals, the 
H3 output clock of each ’C40 serves as the 16R4 PLD synchronizer clock 
for first-level input and output signals. Also, the H1 output clock of each’C40 
serves as the state machine clock for each of the first-level, 16R6 PLDs. In 
addition, for high-speed bus controller synchronization, a 50-MHz crystal 
oscillator is used as the input clock for the 16R8 GBC PLD, the 16R4 time- 
out generator PLD, and the GBC input signal synchronizers. (Note: for fast- 
est bus arbitration, the 'C40s sharing the bus can be synchronized by having 
common RESET and CLKIN inputs. If the "C40s are synchronized in this 
way, the 50-MHz input to the second-level, global contro! PLDs can be the 
common CLKIN.) The AS174 D flip-flops are used as GBC input signal syn- 
chronizers. 


Due to these arbitration synchronizer delays and the 35-ns SRAMs, access 
to the shared memory requires wait states. After an arbitration win, the first 
shared memory access requires three H1 cycles, and arbitration requires 
at least two H1 cycles from BUS REQUEST active to BUS ENABLE active. 
Figure 13—17 is a timing diagram of the arbitration contest. A bus master’s 
first access after an arbitration win takes at least three H1 cycles; however, 
subsequent read or write accesses require only two H1 cycles. The three- 
cycles required for the first access provide enough time for the old bus mas- 
ter to stop driving the bus after an arbitration loss and enough time for new 
bus master control signals to go active and inactive to complete 35-ns 
memory accesses. Also, three-cycle memory accesses allow enough time 
for signal buffering (buffer delays are less than 15 ns with commercially 
available parts) between the processor bus and memory. 


The subsection that follows covers the global bus configuration for use with 
this shared memory configuration. 


13.5.3.1 Global Memory Interface Control Register (GMICR) Configuration 
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For use in this shared memory configuration, the global bus should be confi- 
gured as such at the GMICR: 


SWW = 00 (RDYint = RDYext) 
STRB ACTIVE = 011115 
PAGESIZE = 011115 

STRB SWITCH = 0 


In addition, IOF1 should be configured as a general-purpose output pin. 
IlOF1 high signals that a high-priority DMA request is active. 
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Figure 13-17. Successful TMS320C40 Arbitration and Data Read From Shared Bus Memory Followed 
by an Unsuccessful Arbitration Contest 
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13.6 Bus Arbitration 
13.6.1 Arbitration Implementation 
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Arbitration on the bus is implemented with two levels of logic. The first level 
consists of four identically programmed 7-ns, 16R6 PLDs, and four identi- 
cally programmed 16R4 PLDs, with one16R4 and 16R6 associated with 
each ’C40. The signals needed for arbitration from the ’'C40 are STAT(3-0), | 


STRBO, STRB1, LOCK, and IIOF1 (PAGE can be used in designs where 


page switching is necessary). IIOF1 should be configured as an output pin 
and should be used to indicate that a high-priority DMA transfer is active. 
Applications software should set IIOF1 low before priority DMA cycles are 
started. Figure 13—18 illustrates graphically the state machine for each of 
the first-level 16R6 PLDs. Each first-level PLD sends an active low BUS- 
REQ signal to the global bus controller, the second level of arbitration logic. 
The global bus controller sends a BUSGRANT signal to the requesting 
’C40’s first level logic when it has been granted control of the global bus. If. 
an interlocked or high-priority DMA bus request has been granted, the first- 
level logic will keep its BUSREQ asserted low as long as interlocked or prior- 
ity DMA cycles are required. The bus controller will see BUSREQ remain 
active and will give the current C40 bus master access to the bus until the 
interlocked or priority DMA operations are complete. 


After a high-priority DMA bus cycle is complete, the C40 applications soft- 
ware should clear (set IIOF1 to logic level 1). Accordingly, interlocked ac- 
cess to memory should always end inaSIGl, STII, or STFl operation to bring 
LOCK inactive. If priority accesses are completed by making IIOF1 or 
LOCK inactive, the first-level PLD will always have an opportunity to bring 
its BUSREQ inactive, preventing shared bus deadlock. 


When the ’C40 associated with a first-level PLD is not the global bus master 
(i.e., Cannot access the global bus), the first-level PLD sends a logic level 
one BUSREADY signal to that C40, extending any pending bus cycle until 
after the ’C40 becomes bus master and has completed an access. In addi- 
tion, each first-level of logic sends both a BUSENABLE and CTLENABLE 
signal to the corresponding C40. The BUSENABLE signal is connected to 
the DE and AE pins and CTLENABLE is connected to the CE pin of the cor- 
responding 'C40. These two signals cause the following to be in high-impe- 
dance when another ’C40 in the system is accessing the shared bus: bus 
chip enable and the address and data lines. 
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Figure 13-18. Shared Bus Interface PLD State Machine 
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Notes: 1) Inthis state diagram, the output signals are all shown as active high for diagram clarity. 
2) A“!” in front of a signal indicates that it is not active (deasserted). 


3) & = logical AND of signals; # = logical OR of signals. 
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For proper system reset operation, a RESET signal clears bus requests to 
the global bus controller and sends a logic level one BUSRDY and 


‘BUSENABLE signal to each ’C40 to extend upcoming bus cycles and three- 


state the bus until that 'C40 has been granted access to the bus. 


Figure 13-19 shows equations for programming the 16R6 PLD used for the 
first-level logic. The PLD language shownin this figure is ABEL. ABEL’s PLD 
language is used to describe the state machine illustrated in Figure 13-18. 


Qe ee 


Note: Active-Low Indicators 


In listings (e.g., Figure 13-19), anunderscore following a signal name (e.g., 
busreq_) indicates the signal is active low. (in regular text, such signals are 
overbarred (e.g., BUSREQ). | 


Wi cece ecliptic 


The three PLD outputs — busreq ,busenable , and busrdy — are 
used for three of the output state bits. The park state and 
start state bits (used to indicate the park state and start state) are the 
fourth and fifth output state bits. Also included in the ABEL description are 
test vectors for the state machine. 


The PLDs described in Figure 13-19 and Figure 13—20 work together to 
interface to the GBC. Figure 13-20 (page 13-60) shows equations for 
programming the 16R4 PLD used for synchornizing the first-level input and 
output signals. 
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Figure 13-19. PLD Equations for Programming the 16R4 PLD (First-Level Logic) 


|module C40 local glob bus_interf 

jtitle’ 

|DWG NAME Local control (of shared bus arbitration 
| (logic) 

| 

|\DWG # 


| 
| COMPANY TEXAS INSTRUMENTS INCROPORATED’ 


| 
| c40u2 device 'P16R6’; 


“inputs for global interface logic 


“outputs for global interface logic 


start_state Pin 18; “low if the output state is the 
strt_cycle state 


| 

| hl Pin 1; “clock input 

| priDMA _ Pin 2; “flag req output used to signal 
| priority DMA 

| stat3 Pin 3; “stat3=0 STRBO access, stat3=1 
| STRB1 access 

| stat2 Pin 4; 

| statl Pin 5; 

| stat Pin 6; 

| strb0 _ Pin 7; 

| strbl_ Pin 8; “rdy signal from the external 
| expansion connector 

| bg __ Pin 9; “busgrant (from bus arbiter) 

| lock_ Pin 12; | 

| reset __ Pin. 29; 

| 

| 

| 


| busreq __ Pin. 173 

| busenable _ Pin 16; 

| busrdy _ Pin 15; 

| park_state Pin 14; “low if the output state is the 

| park state 

| , 

| gwe__ Pin 13; “write enable signal for shared 


memory 


“define machine state bits 
“[{start,park,busreq_,busenable_,busrdy ]; 


idle = “Didi Tie: = “3d 
req cycle “Dio La: 27 
strt_cycl *b01001; “09 
do_cycle *b11001; “22 


13-51 


Bus Arbitra tion 


NE SNNI LENS RING SN oe 


Be Se i Rea Le ee ee EEE ES REO ETE RENT Ee EER ee Ee NMA NUMER NS Beaten tatareneteNaMS Mem ateset nes 


Figure 13-19. PLD Equations for Programming the 16R4 PLD (First-Level Logic) (Continued) 


|040 | fin cycle = %b11000; “24 

0041 | park = *b10001; “17 

0042 | | 

0043 | “convert to positive logic to make the test vectors easier to 
understand | 

0044 | 

0045 | lock = ‘!lock_; 

0046 | bg =  !bg ; 

0047 | priDMA = !priDMA ; 

0048 | idle stat = (stat2 & statl & stat0O); “the bus is idle 

| | when all are hi 

0049 | 

0050 | “outstate 

0051 | ost = [start _state,park state,busreq _, 

busenable ,busrdy ]; 

0052 | pene 

0053 | c,H,L,X = .C.,1,0,.X.; 

0054 |@page 

0055 | : 

0056 |state diagramost 

0057 | | 

0058 |state idle: 

0059 | case (!reset_ # idle_stat) sidle; 

0060 | ( reset_ & !idle stat ) sreq cycle; 

0061 | endcase; 

0062 | 

0063 | 

0064 |state req cycle: 

0065 | case (!reset_ # idle stat) :idle; 

0066 | ( reset. & bgo_ & !idle stat ) :req_cycle; 

0067 | ( reset_ & !bg_ & !idle_stat ):strt_cycl; 

0068 | endcase; 

0069 | 

0070 | 

0071. | 

0072 |state strt_cycl: 

0073 | case (!reset_) :idle; 

0074 | ( reset_) :do_cycle; 

0075 | endcase; 

0076 | 

0077 | 

0078 |state do cycle: 

0079 | case (!reset_). sidle; 

0080 | ( reset_) sfin cycle; 
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Figure 13-19. PLD Equations for Programming the 16R4 PLD (First-Level Logic) (Continued) 


0081 | endcase; 

0082 | 

0083 | 

0084 | 

0085 |state fin_cycle: 

0086 | case 

0087 | (!reset_) :idle; 

0088 | ( reset_) :park; 

0089 | endcase; 

0090 | 

0091 | 

0092 | 

0093 |state park: 

0094 | case (!reset_ # (bg_ & lock_ & priDMA_ )) :idle; 

0095 | ( reset_ & idle stat & (!bg_ # !lock_ # !priDMA_)) 
| :park; 

0096 | ( reset _ & !idle stat & ((!stat3 & strb0O_) # 
| (stat3 & strbl )) : 

0097 | (!bg_ # !lock_ # !priDMA )) :do_ cycle; | 

0098 | ( reset_ & ((!stat3 & !strbO_) # (stat3 & !strbl_)) 

0099 | & (!bg_ # !lock_ # !priDMA_)) :fin_cycle; 

0100 | | | 

0101 | endcase; 

0102 | 

0103 | 

0104 | 

0105 |equations 

0106 : l'gwe_ := reset_ & stat2 & !idle stat & ((!bus 


req_ & !bg_) # !busenable ); 
0107 |@page 
0108 | | 
0109 |”Test 1st level global arbitration logic 
0110 |test_vectors 


O111 | ((h1,stat3,stat2,statl,stat0,lock_,priDMA,strb0 ,bg,strbl , 
reset_]->[ost,gwe_]) 


| 
0112 {[{ cc, X, H, H, H, X, X, X, X, H, L) -> [ idle,H]; 
0113 |[ c, X, H, L, H, H, L, X, L, H, H] -> [req_cycle,H]; 
0114 |[ c, X, H, H, H, xX, X, xX, xX, H, L] -> [ idle,dH]; 
0115 | 
0116 |[ c, X, X, X, L, X, X, X, L, H, H] -> [req_cycle,H]; 
0117 I[ c, X, H HH, X, iL, xX, X, xX, H, H] -> [strt_cycl,L]; 
0118 |{ oc, X, H, H, H, X, X, X, X, H, L) -> [ idle,H]; 
0119 |”vector 7 | 
0120 |[{ c, X, X, X, L, X, X, X, L, H, H] -> [req_cycle,H]; 
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Figure 13-19. PLD Equations for Programming the 16R4 PLD (First-Level Logic) (Continued) 


0164 |[{ c, H, L, xX, H, H, L, H, xX, L, H) -> [park,H]; 

0165 |[ L, H, L, X, H, H, L, H, X, H, H]) -> {[{park,H]; 

0166 |[{ oc, H, L, X, H, H, H, H, L, L, H)] -> [(fin_cycle,H]; 
0167 |[ oc, H, L, xX, H, HH, HH, H, X, L, HH) -> [park,H}; 

0168 |[ L, X, L, xX, H, H, H, H, xX, H, H) -> [park,H]; 

0169 |[ oc, L, L, L, H, L, X, H, H, H, H)] -> [do_cycle,H]; 
0170 |[{ oc, L, L, L, H, H, L, H, Lb, H, H) -> [fin_cycle,H]; 
0171 | c, L, L, lL, H, HH, L, H, xX, L, HH] -> {[park,H]; 

0172 |[ c, H, HH, H, H, H, HH, HH, xX, H, H) -> [park,H]; 

0173 |[ c, H, H, H, H, HH, H, H, X, H, H) -> [park,H]; 

0174 I[[ Cy. Xp. My Kp Ke OH, ye Hy. Dy. Hy AH} => “ladile,A}; 

0175 |”vector 37 

0176 |[ c, X, H, H, H, L, X, H, L, H, H] -> {[idle,H]; 

O177 |{ c, X, L, L, L, Lb, X, H, L, H, H) -> [req_cycle,H]; 
0178 |[{ c, X, L, L, L, L, X, H, L, H, H)] -> [req _cycle,H]; 
0179 |{f[ c, X, L, L, L, X, X, H, HH, H, H) -> [strt_cycl,H]; 
0180 |[ c, X, L, L, L, L, X, H, H, H, H] -> [do_cycle,H]; 
0181 |[{ c, X, L, L, L, H, L, H, L, H, H)] -> [fin _cycle,H]; 
0182 |[ c, H, L, L, L, H, L, H, X, H, H] -> [park,H]; 

0183 |{[ L, H, L, L, lL, H, L, H, xX, H, H] -> [park,H]; 

0184 |[ c, H, L, L, L, L, H, 4H, L, L, H) -> [fin_cycle,H]; 
0185 |[ c, H, L, L, L, L, H, H, xX, L, H] -> [park,H]; 

0186 I[[ L, X, L, L, L, Lb, H, H, xX, H, HH) -> [park,H]; 

0187 |[[ c, H, L, L, L, L, H, H, H, H, H] -> [do _cycle,H]; 
0188 |[ c, H, L, L, L, L, H, H, L, L, H) -> [fin_cycle,H]; 
0189 |[ «¢, H, L, L, L, H, L, H, X, L, H] -> [park,H]; 
0190 |[ L, H, L, L, L, H, L, H, X, H, H) -> (par,H]; 

0191 |[ c, H, H, H, H, L, H, H, X, H, H] -> [park,H]; 

0192 |[ c, H, H, H, H, L, H, H, X, H, H] -> [park,H]; 

0193 |[{ c, L, L, X, X, H, L, L, H, H, H] -> [fin_cycle,H]; 
0194 |{ oc, L, L, X, X, H, L, H, X, H, H] -> {[park,H]; 

0195 |{ c, X, X, X, X, H, L, H, L, H, H] -> [idle,H]; 

0196 |”vector 62 

0197 [f[ c, X, HH, H, H, L, xX, H, L, H, H) -> {[idle,H]; 

0198 |[ c, X, L, X, H, L, X, H, L, H, H)] -> [req cycle,H]; 
0199 |[ c, X, L, X, H, X, X, H, H, H, H] -> [strt_cycl,H]; 
0200 |[ c, X, L, X, H, L, X, H, H, L, H] -> [do cycle,H]; 
0201 |[{ c, H, L, X, H, H, L, H, L, L, H] -> [fin_cycle,H]; 
0202 |{[ c, H, L, X, H, H, L, H, xX, L, HH] ~-> [park,H]; 

0203 |{ L, H, lL, X, H, H, L, H, xX, H, H) -> [park,H]; 

0204 I[ oc, Hy, L, X, H, H, H, H, Ly, ly HH] => [fin cycle,H]; 
0205 |f c, H, L, X, H, H, H, H, xX, L, H) -> [park,H]; 

0206 |[ L, H, L, X, H, H, H, H, X, H 


, H] -> [park,H]; 
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Figure 13-19. PLD Equations for Programming the 16R4 PLD (First-Level Logic) (Concluded) 
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| [ c, H, H, L, H, L, X, H, H, H, H) -> [do cycle,lL]; 
|{ oc, H, H, L, H, H, L, H, L, H, H) -> [fin_cycle,L]; 
i{ oc, H, H, L, H, H, L, H, X, H, H) -> (park,L]; 

|{ oc, X, L, Lb, L, L, H, H, H, H, H)] -> [do _cycle,H]; 

| [ c, H, L, L, L, L, H, H, L, L, Hj) -> [fin cycle, H]; 
|{ coc, H, L, L, L, H, L, H, X, L, HH] -> {[{park,H]; 

| { L, H, L, L, L, H, L, H, X, H, H) -> [park,H]; 

iL ey Ke Ep Xp es He Des a. He ae OH) SS Sein eyele ny: 
[{ cc, L, L, X, X, H, L, H, X, H, H) -> [park,H]; 

|{ oc, H, H, H, H, X, X, H, H, H, H] -> [park,H]; 

If c, H, H, H, H, HH, H, H, HH, H, H)]) -> [park,H]; 

l{ c, X, X, X, X, H, L, H, L, H, H] -> [idle,H]; 
|@page 

|{ oc, X, H, X, L, X, X, X, L, H, H)] -> [req _cycle,H]; 
if oc, X, H, X, L, X, X, X, H, H, HH) -> [strt_cycl,L]; 
|{ oc, X, H, X, L, X, X, X, H, HH, H)] -> [do_cycle,lL]; 
If c, X, H, X, L, X, X, H, HH, H, Hj) -> [fin_cycle, L]; 
| { c, L, H, X, L, X, X, H, H, H, HH) -> (park, L]; 

‘mt c, X, H, X, xX, H, L, xX, L, H, H)] -> [ idle,L]; 

|{ oc, X, H, L, L, X, X, X, L, H, H] -> [req _cycle,H]; 


|” ({h1, stat3, stat2,statl,stat0,lock_,priDMA, strb0_ ,og,strbl, 
|reset] -> [outst , gwe_ }) 


| 
| 
lend c40_ local _glob_bus_interf 
| 
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The six PLD states are idle, request cycle, start cycle, 
do cycle, finish cycle, and park. 


1) After reset, the first-level PLD’s state machine starts inthe idle state 
and transcends to the request cycle state when a global bus 
transfer is required. 

2) Thetransition to request cycle occurs when any of the ’C40 status 
lines (STAT2—0) are low (when the status lines are all high, the bus is 
idle). Inthis state, the BUSREQ signal becomes active and is sent to the 
GBC PLD. 

3) When the PLD receives a BUSGRANT signal, the state machine transi- 
tions to the start cycle state. Forthe start cycle, do cycle, 
finish_cycle and park states, BUSREQUEST and BUSENABLE 
are active. 

4) From the start_cycle state, the state machine transitions to the 
do_cycle State during the next H1 cycle. 

5) From the do cycle state, the state machine transitions to the fin- 
ish_cycle state inthe next H1 cycle. Inthis state, the BUSRDY signal 
is active. BUSRDY indicates to the ’C40 that the memory access has 
been completed and that another access can be started. 

6) From the finish cycle state, the state machine transitions to the 
park state during the next H1 cycle. BUSRDY goes inactive in anticipa- 
tion of another bus cycle-starting. 


Bus parking is implemented for this bus arbitration protocol to allow the cur- 
rent bus master to retain control of the bus and continue making accesses 
to global memory as long as consecutive interlocked or priority DMA cycles 
are required or if no other processor is requesting use of the bus. Bus park- 
ing reduces memory access latency when only one ’C40 desires access to 
the global bus during any duration. 


Notice that when the state machine leaves the park state, allowing the cur- 
rent bus master to perform another shared memory access, the state ma- 
chine can transition to either the finish cycleordo cycle states, de- 
pending on the level of STRBO or STRB1. The STRB signal remaining low 
between accesses indicates back-to-back read cycles, which require only 
two H1 cycles to complete for 35-ns memories. Hence, the state machine 
transitions from the park state directly to finish_cycle. If the STRB sig- 
nal goes high one H1 cycle and then back low between accesses, the state 
machine transitions from the park state to do_cyc1e, allowing the onecycle 
for the STRB high and two for the subsequent access. 


The global bus controller (second-level logic) is implemented as a 16R8 
PLD. This PLD takes as inputs the outputs of each of the four first-level 
PLDs. Hence, the GBC has four BUSREQUEST signals as inputs — one 
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from each of the four, first-level logic PLDs associated with each of the 
’‘C40s. The GBC asserts four outputs, the BUSGRANT signals associated 
with each ’C40’s first-level arbitration logic. 


Figure 13-21 illustrates graphically the state machine for the global bus 
controller. The GBC asserts low the BUSGRANT signal associated with the 
‘C40 that wins an arbitration contest. This BUSGRANT signal remains ac- 
tive until another processor desires access to the shared bus and a time-out 
signal has been received. The new contestant is not granted access to the 
shared bus until after the current bus master deasserts (brings high) its 
BUSREQ signal, indicating it has finished its priority accesses or single 
nonpriority memory access. The system RESET signal should also be an 
input to the GBC PLD. The system RESETsignal should clear (deassert 
high) all BUSGRANT signals to any of the 'C40s and return the GBC state 
machine to an idle state. 


The time-out signal is also a necessary input for the GBC because of the 
high speed of bus arbitration. Before taking a BUSGRANT signal away, the 
GBC must guarantee that a bus arbitration winner has had a chance to see 
the BUSGRANT and start using the bus. The timeout signal is generated 
by a counter implemented with a 16R4 PLD. The counter starts counting 
when a processor first receives a BUSGRANT signal. It counts four cycles 
and then issues a time-out signal to the GBC indicating that the GBC can 
take away the current master’s BUSGRANT if necessary. Hence, the time- 
out counter provides at least four cycles for a’C40’s first level of logic to see 
a BUSGRANT and start using the bus before the GBC can take the 
BUSGRANT away. Figure 13-23 contains the ABEL PLD equations for the 
time-out counter. 


The type of arbitration implemented in this GBC example is a rotating prior- 
ity scheme. This rotating priority scheme provides fair arbitration among the 
four ’C40s sharing the global bus. In a rotating priority scheme, the last bus 
master becomes the lowest (last serviced) priority processor. The proces- 
sors sequentially rotate throughout the priority list with the least recently 
serviced processor having the highest priority in subsequent arbitration con- 
tests. The priority rotates every time the bus request of the current bus mas- 
ter goes inactive and another processor desires access to shared memory. 
At system reset, the priorities are 1, 2, 3, or 4, with 1 being the highest or 
first serviced priority. | 


Figure 13—22 shows PLD equations for programming the 16R8 PLD used 
to implement the rotating priority global bus controller. ABEL’s PLD lan- 
guage is used to describe the state machine shown in Figure 13-21. 
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Note: Active-Low Indicators 


In listings (e.g., Figure 13-22), an underscore following asignal name (e.g., 
busreq_) indicates that the signal is active low. (in regular text, such sig- 
nals are overbarred (e.g., BUSREQ). 


a ee Ee CE a ee | 


The PLD’s four outputs are the four busgrant_ lines, with each line giving 
a different C40 access to the shared bus. These four bits are also used as 
half of the output state bits. The other four state bits are used to indicate the 
ready state corresponding to each busgrant state. At reset, the GBC state 
machine goes to the idle state. Allbusgrant_ signals are inactive. In the 
idle state, br_ signals can be received from any of the four first-level PLDs. 
After arbitration, the state machine makes a transition to one of the grx 
states (where x = 1, 2, 3, or 4). The corresponding busgrantx_ output 
signal goes active. The GBC stays in that state untila busrequest_ and 
a time-out signal is received from another processors’ first-level PLD. Once 
another busrequest _ is received, the state machine transcends to the 
corresponding bryx state. In this state, the busgrantx_ signal goes 
inactive. However, the GBC state machine stays in this state until the 
corresponding bus request (brx_) input goes inactive high, indicating that 
the current bus master has relinquished control of the shared bus. When 
brx_ goes inactive, the state machine changes to the highest priority 
processor’s gry state (where y = 1, 2, 3, or 4) thathaditsbr__ signal active. 
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Figure 13-20. PLD Equations for Programming the 16R4 PLD 
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|module c40 global bus_interface 
[title’ 
| 
| DWG 
|DWG # 
| 
| COMPANY TEXAS INSTRUMENTS INCROPORATED’ 
| 
|c40ul device ‘P16R4’; 
| 
“inputs 
i Pin 1; 
bg _ Pin 7; 
busrdy _ Pin 8; “busrdy from global interface PAL 
busenable _ Pin 9; “busenable from global interface 
PAL 
“outputs 
ctrl enable _ Pin 18; “enable signal for control lines 
rdy_ Pin 17; “rdy signal for shared SRAM 
sync _ae_ Pin 15; “synchronized busenable signal 
bg sync_ Pin 14; 


| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| “name substitutions 
| CE_ = Ctrl enable ; 
| rdy ; 
| 

| 

| 

| 

| 

| 

| 

| 

| 

| 

| 

| 

| 

| 


bry_ 


“substitutions for test vectors 


¢,H, L,X, = gCrliOy «cay 

equations 
Sync_ae : = busenable ; 
!ctrl_enable = !sync_ae_ & !busenable ; 
rdy_: = busrdy ; 
bg sync _: = bg; 


@page 


|\”“Test lst level global arbitration logic 
|test_vectors 
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Figure 13-20. PLD Equations for Programming the 16R4 PLD (Concluded) 
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({h3,busenable_,busrdy_,bg_]->[CE_,bry _,bg_sync_]}) 
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H]; 
H]; 
L); 
L]; 
H]; 
H]; 
H]; 
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H]; 
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c40_global_bus_interface 


Figure 13-21. Global Bus Controllor PLD (Rotating Priority Mode Only) 


(!br2 & !br3 & !br4) # timeout br1 (lbr3 & !br4 & !br1) # Itimeout 
timeout & me 
br2 # br3 # br4 


a 
(ryt br2 & Ibr1 Pega 
er 


Go 


lbr4 & fbr 
timeout & 
br3 # br4 # br 


lbr1 & !br2 & !br3 & !br4 
No Requests 


timeout & 
bri # br2 # br3 


timeout & 


br4 & !br3 


—y br4 # br1 # br2 


(fori & !br2 & !br3) # !timeout br3 (!or4 & !bri & !br2) # ltimeout 
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Figure 13-22, PLD Equations for Programming the 16R8 PLD 


0001 [module global _bus_cntrl 

0002 ltitle’ 

0003 | 

0004 |DWG NAME Shared bus interface 
0005 |DWG # 


0006 | 

0007 |COMPANY TEXAS INSTRUMENTS INCORPORATED’ 
0008 | 

0009 | xub5 device 'P16R8°; 

0010 | 

0011 | h50 Pin 1; “50 MHz clock 
0012 | bree. Pin. 2; “bus request 1 
0013 | DrZ Pin 3; “bus request 2 
0014 | brs... Pin 4; “bus request 3 
0015 | br4 _ Pin 5; “bus request 4 
0016 | reset _ Pin 6; “reset 

0017 | fix cot Pin 7; “fix/rot_ not used here but can be added 
0018 | oe Pin 11; 

0019 | timeout _ Pin 8 

0020 | vss Pin 10; 

0021 | | 

0022 | bg1 _ Pin 19; “grant 1 

0023 | bg2 __ Pin 18; “grant 2 

0024 | bg3 _ Pin 17; “grant 3 

0025 | bg4 _ Pin 16; “grant 4 

0026 | s3 Pin 15; “state 3 

0027 | $2 Pin 14; “state 2 

0028 | sl Pin 13; “state 1 

0029 | s0 Pin 12; “state 0 

0030 | vec Pin 20; 

0031 | 

0032 | C,H; iy Xx = PC 85 gp Ops cat 

0033 | | 

0034 | “define state machine bits 

0035 | bus_state = [s3,s2,s1,s0,bg4 ,bg3_,bg2_,bgl_] 
0036 | 

0037 | “states 

0038 | bryl = “601111111; “ready l 
0039 | bry2 = “610111111; “ready 2 
0040 | bry3 = *b11011111; £=“ready 3 
0041 | bry4 = “b11101111; “ready 4 
U042 | idle = “b11111111; “idle state 
0043 | 

0044 | grl = *b611111110; “grant 1 
0045 | gxr2 = *b11111101; “grant 2 
0046 | gxr3 = *611111011; “grant 3 
0047 | gr4 = “b11110111; “grant 4 
0048 | 

0049 | “convert inputs to positive logic 
0050 | brl = !brl_; 

0051 | br2 = !tbr2 ; 

0052 | br3 = Or3_; 

0053 | br4 = !br4 ; 

0054 | reset = treset_; 

0055 £|@page 
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Figure 13-22. PLD Equations for Programming the 16R8 PLD (Continued) 


0056 | | | 

0057 |state_diagrambus_state 

0058 | state idle: | 

0059 | if ( reset ) then idle 

0060 | else if ( !reset & !brl & !br2 & br3 & br4)then gr4 
0061 | else if ( !reset & !brl & !br2 & br3 ) then gr3 

0062 | else if ( !reset & !brl & br2 ) then gr2 

0063 | else if ( !reset & brl ) then gril 

(0064 | else idle; 

0065 

0066 | state bry4: 

0067 | if ( reset ) then idle 

0068 | else if ( !reset & br4 ) then bry4 

0069 | else if ( !reset & !brl & !br2 & !br3 & !br4)then idle 
0070 | else if ( !reset & !brl & !br2 & br3 & !br4) then gr3 
0071 | else if ( !reset & !brl & br2 & !br4) then gr2 

0072 | else if ( !reset & brl & !br4) then gril; 

0073 | 

0074 | state bry3: 

0075 | if ( reset ) then idle 

0076 | else if ( !reset & br3 ) then bry3 

0077 | else if ( !reset & !br4 & !brl & br2 & !br3) then idle 
0078 | else if ( !reset & !br4 & !brl & br2 & !br3) then gr2 
0079 | else if ( !reset & !br4 & brl & !br3) then gril 

0080 | else if ( !reset & br4 & !br3) then gr4; 

0081 | 

0082 | state  bry2: 

0083 | if ( reset ) then idle | 

0084 | else if ( !reset & br2 ) then bry2 

0085 | else if ( !reset & !br3 & !br4 & !brl & !br2) then idle 
0086 | else if ( !reset & !br3 & !br4 & brl & !br2) then gril 
0087 | else if ( !reset & !br3 & br4 & !br2) then gr4 

0088 | else if ( !reset & br3 & !br2) then gr3; 

0089 | 

0090 | state _ bryl: 

0091 | if ( reset ) then idle 

0092 | if ( !reset & brl ) then bryl 

0093 | else if ( !reset & !br2 & !br3 & !br4 & !brl1) then idle 
0094 | else if ( !reset & !br2 & !br3 & br4 & !brl) then gr4 
0095 | else if ( !reset & !br2 & br3 & !brl1) then gr3 

0096 | else if ( !reset & br2 & !brl1) then gr2; 

0097 | 

0098 | state gr4: 

0099 | if ( !reset & (timeout # !brl & !br2 & !br3)) then gr4 
0100 | else if ( reset ) then idle 

0101 | 

0102 | 

0103 | state gr3: 

0104 | if ( !reset & (timeout # !br4 & !brl & !br2)) then gr3 
0105 | else if ( reset ) then idle 

0106 | 

0107 | 

0108 | state gr2: 
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Figure 13-22. PLD Equations for Programming the 16R8 PLD (Continued) 


0109 
0110 
0111 
0112 
0113 
0114 
0115 
0116 
0117 


0115 
0116 
0117 
0118 
0119 
0120 
0121 
0122 
0123 
0124 
0125 
0126 
0127 
0128 
0129 
0130 
0131 
0132 
0133 
0134 
0135 
0136 
0137 
0138 
0139 
0140 
0141 
0142 
0143 
0144 
0145 
0146 
0147 
0148 
0149 
0150 
0151 
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if (!reset &) (timeout # 


'br3 & 


else if ( reset ) then idle 


state 
if 1 
else if ( reset ) then idle 


gril: 


!'reset & (timeout # 


rotating priority vectors 


fbr4 & !brl1 )) then gr2 


fbr2 & !br3 & !br4 ) then grl 


| ({h50,bri,br2,br3,br4,timeout_,reset_] -> [bus_state]) 
|“check for go to IDLE 
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[idle]; 
[gxr4]; 
[idle]; 
[gr3]; 
[idle]; 
[gr2]; 
[idle]; 
[grl]; 
[idle]; 
[gxr4]; 
[bry4]; 
[idle]; 
[gxr3]; 
[bry3]; 
[idle]; 
[gxr2]; 
[bry2]; 
[idle]; 
(grl]; 
[bryl]; 
[idle]; 


[idle]; 
[gxr3]; 
[idle]; 
[gr2]; 
[idle]; 
[grl]; 
[idle]; 
[gxr4]; 
[bry4]; 
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Figure 13-22. PLD Equations for Programming the 16R8 PLD (Continued) 


0152 
0153 
0154 
0155 
0156 
0157 
0158 
0159 
0160 
0161 
0162 
0163 
0164 
0165 
0166 
0167 
0168 
0169 
0170 
0171 
0172 
0173 
0174 
0175 
0176 
0177 
0178 
0179 
0180 
0181 
0182 
0183 
0184 
0185 
0186 
0187 
0188 
0189 
0190 
0191 
0192 
0193 
0194 
0195 
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[idle]; 
[gr3]; 

[ory3]; 
[idle]; 
[gr2]; 

[bry2]; 
[idle]; 
[gr1]; 

[bry1]; 
[idle]; 


[gxrl]; 
[grl]; 
[grl1]; 
[bry1]; 
[bryl]; 
[bry1]; 


[gr4]; 
[gr4]; 
[gr4]; 
[gr4]; 
[bry4]; 
[bry4]; 
[bry4]; 


[gr3]; 
[gr3]; 
[gr3]; 
[gxr3]; 
[bry3]; 
[ory3]; 
[bry3]; 


[gr2]; 
[gr2]; 
[gr2]; 
[gr2]; 
[bry2]; 
[bry2]; 
[bry2]; 


[grl1]; 
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Figure 13-22. PLD Equations for Programming the 16R8 PLD (Concluded) 


0196 
0197 
0198 
0199 
0200 
0201 
0202 
0203 
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C, 
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L, 
X, 
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global_bus_cntrl 


H 


ors Oe Dees Ore Deere PS Pre OES PES © 


a re PS Pe OS PS 


] 


ee ee 


ee ee ee ee 


[bryl]; 
[gxr3]; 
[bry3]; 
[grl]; 
[bry1]; 
[gr2]; 
[bry2]; 
[gr4]; 
[bry4]; 
[gr2]; 
[bry2]; 


[gr3]; 
[bry3]; 
[9x4]; 
[bry4]; 
[grl]; 
[bryl]; 
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0001 
0002 
0003 
0004 
0005 
0006 
0007 
0008 
0009 
0010 
0011 
0012 
0013 
0014 
0015 
0016 
0017 
0018 
0019 
0020 
0021 
0022 


0023 


0024 
0025 
0026 
0027 
0028 
0029 
0030 
0031 
0032 
0033 
0034 
0035 
0036 
0037 
0038 
0039 
0040 
0041 
0042 
0043 
0044 
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| Figure 13-23. PLD Equations for Programming the 16R6 PLD 


|module c40_global_ timeout 
|title’ 
|DWG NAME global arbitration 
|DWG # 
| COMPANY TEXAS INSTRUMENTS INCROPORATED 
| 
| DATE 
| 
c40u4 device ’P16R6/; 
“inputs 
h50 Pin 1; 
bg1_ Pin 2; 
bg2 Pin 3; 
bg3 __ Pin 4; 
bg4 _ Pin 5; 
timeout _ Pin 13; “output 
sl Pin 16; 
sO Pin 15; 


“name substitution to increase readability 


bus active 


= (!lbgl_ # !bg2_ # !bg3_ # !bg4 ); 


“[timeout_,s1,s0]; 


‘states 


idle 
countl 
count2 
count3 
time 


outstate 


c,H,L,X 


| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|“define machine state bits 
| : 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 


sos Ui a 
“b110; 
“b101; 
“b100; 
“b011; 


[timeout_,s1,s0]; 


sCeyp tO, oAe?Z 


| state diagram outstate 


| 
| state idle: 


|if (!bus active) then idle 
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Figure 13-23. PLD Equations for Programming the 16R6 PLD (Continued) 


0045 | else countl; 

0046 | 

0047 |state countl: 

0048 | if (!bus_active) then idle 
0049 | else count2; 

0050 | 

0051 |state count2: 

0052 | if (!bus_active) then idle 
0053 | else count3; 

0054 | 

0055 |state count3: 

0056 | if (!bus_active) then idle 
0057 | else time; 

0058 | 

0059 |state time: GOTO idle; 

0060 | 


0061 |@page 
0062 |”Test counter 
0063 |test_vectors 


0064 | ([h50, bg1_, bg2_, bg3_, bg4 ] -> [outstate]) 
0065 ![ c, H, H, H, H ]j] -> [ idle]; 
0066 |[ cc, L, H, H, H j] -> [countl1]; 
0067 |[ c, H, H, H, H Jj -> [ idle]; 
0068 |[ oc, L, | H, H, H j -> [countl]; 
0069 |[ c, xX, x X, X J => [count2]; 
0070 |[ c, H, H, H, Hj] => [ idle]; 
OO71 |[{ cc, L, H, H, H ]j -> [{countl1]; 
0072. hf se, X, X, X J -> [count2]; 
0073 |I[ c, xX, X, x; X J] -> [count3]; 
0074 |[ c, H, H, H, H j -> [ idle]; 
0075 |I[ cc, L, H, H, H Jj] -> [count1]; 
0076 |[{ c, X, x, x, X J] -> [count2]; 
0077 |[{ ce, X, X, x; X J -> [count3]; 
0078 | 

0079 |[ c, X, X, X, X J] -> [ idle]; 
0080 |[ oc, H, L, H, H J] -> [countl]; 
0081 |[ C, X, X, X, Xx ] -> [count2]; 
0082 |i[{ c, X, X, X, X Jj -> [count3]; 
0083 |I[ c, xX, Xx, x; X J] -> [time]; 
0084 |[ c, X, x; x, X J --> [ idle]; 
0085 |[ oc, H, H, ine H ] -> [count1]; 
0086 |[ c, xX, x x, X J) -> [count2]; 
0087 |[ c, X, x X, xX j -> [count3]; 
0088 |[ c, xX, x; x; X J -> [time]; 
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Figure 13-23. PLD Equations for Programming the 16R6 PLD (Concluded) 
0089 |[ cc, X, X, xX, X j] -> [ idle]; 
0090 | c, H, H, H, } -> [countl]; 
0091 | cy Xp, X, X, X J <-> [count2]; 
0092 | c, X, X, X, X J] -> [count3];_ 
0093 | Cy. xX, x; X, X ] -> [time]; 
0094 | c, X, X, x, X J -> [{ idle]; 
0095 | 
0096 | 
0097 | 
| 
| 
| 
| 
| 
| 
| 
| 


coc eee ee 2 eee Oe aoe | 


0098 
0099 
0100 
0101 
0102 
0103 
0104 
0105 


end c40_global_timeout 


13.6.2 Arbitration Alternatives 


If more arbitration flexibility is desired, a fixed priority mode can be implem- 
ented in the global bus controller PLD. A fixed scheme can be used in con- 
junction with this rotating priority mode if a fixed/rotating input is added to 
the GBC PLD to allow either of the two arbitration methods. One of the spare 
IIOF pins can be configured as a general-purpose output pin to act as the 
arbitration mode control pin. For example, if FIX/ROT (IIOF2) = 0, the four 
’C40s have rotating priorities; if FIX/ROT = 1, the four processors have fixed 
priorities. To reduce state machine complexity, the rotating priorities can be 
preset at system reset to the same values as in the fixed arbitration mode, 
with the processors having priorities of 1, 2, 3, or 4, with 1 being the highest 
(first serviced) priority. 


13.6.3 Global Bus Arbitration and Transfer Timing 


To illustrate the timing involved with global bus arbitration and data trans- 
fers, Figure 13-17 (page 13-47), Figure 13—24 (page 13-72), Figure 13-25 
and Figure 13-26, show shared bus timings using the rotating Prony arbi- 
tration configuration. 


These figures represent a ’C40 requesting a shared bus access when it is 
not currently the bus master. Clock H1 is the output clock of the the ’C-40 re- 
questing access to the bus. Both clocks H1 and H3 have a rate of 25 MHz; 
however, the global bus controller (GBC) input clock is aysnchronous with 
respect to H1 and H3 and has a rate of 50 MHz. 
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Due to the arbitration logic synchronizer delays and the 35-ns SRAMs, 
access to the shared memory requires wait states. A new bus master’s first 
memory access after an arbitration win takes at least five H1 cycles (the five 
cycles include the time period from status lines active to the end of the read 
or write cycle), but subsequent reads or writes take only two H1 cycles. Two- 
cycle memory accesses allow enough time for control signals to go active 
and inactive to complete read or write cycles for 35-ns memories. They also 
allow processors to stop driving the bus before another processor starts 
driving the bus after a bus arbitration contest. Also, the two-cycle memory 
accesses allow enough time for signal buffering between the processor bus 
and memory (buffer delays are less than 15 ns with commercially available 
parts). 


In Figure 13-17 (page 13-47), a’C40 wins an arbitration contest immediate- 
ly and does one read cycle. However, it loses arbitration for the next transfer 
on the shared bus (busgrant__ goes inactive high) and the first-level PLD 
brings its busrequest_ signal inactive high to signal the GBC that it has 
given up the bus. The first-level PLD at the same time sends bus disable sig- 
nals (BUSENABLE and CTLENABLE high) to the AE, DE, and CE pins of 
the ’C40 to three-state the bus. The first-level PLD three-states the bus im- 
mediately because the GBC will give another processor access to the 
shared bus as soon as it sees this BUSREQUEST and a time-out go inac- 
tive. 


Figure 13-24 shows a successful arbitration contest followed by succes- 
sive reads. The ’C40 is allowed to do successive reads on the shared bus. 
because no other processor desires access (busgrant stays active). 


Figure 13-25 illustrates an arbitration win followed by a single write. 
Figure 13-26 shows an aribitration win followed by successive writes and 
an arbitration loss. The second write is allowed to occur because the 
busgrant going inactive is missed by the first-level PLDs, which 
synchronizes on H1 rising. The first-level PLD transcends to the do_cycle 
state because STRB is high and the PLD has not seen the busgrant go 
inactive from the synchronizer output. Even though the first-ievel PLD sees 
thatthe busgrant_ is taken away during the next H1/H3 cycle, it does not 
take away its busrequest_ until the end of the second write cycle. Then, 
the busrequest_ is made inactive, and the bus is disabled. 
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Figure 13-24. Successful TMS320C40 Arbitration; Data Read; Data Read 


do finish finish 


STAT(3-0)_ XK >= 
buseq. Meee 
bo TN eee 
High Imp 
A(30-0) igh Impedance Valid Address Valid Address 


TRE TOK ON TT 
D(31-0) , RXR XR 


RA ADR AAAAAA 
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Figure 13-25. Successful TMS320C40 Arbitration and Data Write From Shared Bus Memory Followed 
by an Unsuccesstul Arbitration Contest 


a idle req req start ne rat park idle 
STAT(3—0) Valid Memory Access Request 


Pending Memory 
ccess 


br \ / 

a 
§ —>7"— 
aaa a aia 


Hi Impedance 
Hi Impedance 0,77, ee 
A(30-0) Valid Address QR 


Hi Impedance 
ela 
STRB \ Hl 


Hi Impedance 


Hi Impedance 


SYXXAXAXAXAY SYA KAKA 
CR _ Valid Data BOXLXYRY 


Hi Impedance 
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Figure 13-26. Successful 'C40 Arbitration; Consecutive Data Writes; Arbitration Win Followed by 
Successive Writes and an Arbitration Loss 


do finish do finish 
idle req req req start txfr txfr park txfr txfr park idle 


NT 
bg a ee ee aa aa 


a es i: ee Sa 


A(30~0) igh Impedance Valid Address Valid Address i 
Sine High Impedance 
High Impedance 
D(31-0) Valid Write Data Valid Write Data = 


13.6.4 Arbitration Protocol Limitations 


This shared bus arbitration protocol uses handshaking between the GBC 
and the processors sharing the global bus to ensure that only one process- 
sor is driving the bus at any given time. Nonetheless, the global bus control- 
ler should not allow another processor to become bus master until the pre- 
vious master is guaranteed to release the bus completely. Since ’C40s have 
a bus disable (AE, DE, or CE) time of less than 15 ns, bus turnoff time is not 
critical unless the GBC input clock frequency is greater than 50 MHz. How- 
ever, if processors with slower turnoff times are used in a shared bus config- 
uration with this protocol, the GBC input clock period cannot be less than 
the bus disable time of the slowest processor in the system. If the GBC input 
clock period is less than a processor’s disable time, the GBC could give a 
new master ownership of the bus before the previous master is off the bus. 
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13.7 Reset Signal Generation Control Function 


Several aspects of ’C40 system hardware design are critical to overall sys- 
tem operation. One such function is reset signal generation. 


The reset input controls initialization of internal C40 logic and also causes 
execution of the system initialization software. For proper system initializa- 
tion, the reset signal must be applied for at least ten H1 cycles, i.e., 400 ns 
for a’C40 operating at 50.00 MHz. Upon powerup, however, it can take 20 
ms or more before the system oscillator reaches a stable operating state. 
Therefore, the powerup reset circuit should generate a low pulse on the re- 
set line for 100 to 200 ms. Once a proper reset pulse has been applied, the 
processor fetches the reset vector from location zero, which contains the 
address of the system initialization routine. Figure 13-27 shows a circuit 
that will generate an appropriate powerup or push button reset signal. 


Figure 13-27. Reset Circuit 


TMS320C40 


45V 1 cass 


Cy = 4.7 UF ae 


The voltage on the reset pin (RESET) is controlled by the RyC 4 network. 
After a reset, this voltage rises exponentially according to the time constant 
R C4, as shown in Figure 13-28. In Figure 13-27, the 74ALS34 is used to 
provide a clean RESET signal to the ’C40. 
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Figure 13-28. Voltage on the TMS320C40 RESET Pin 
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Voltage 


V=Vcc (1-e t/t) 


Vcc 


to=0 ty Time 
The duration of the low pulse on the RESET pin is approximately t;, which 
is the time it takes for the capacitor Cj to be charged to 1.5 V. This is approxi- 


mately the voltage at which the reset input switches from a logic 0 to a logic 
1. The capacitor voltage is expressed as 


V= Veo| 1 : e+] (5) 


where t = R4C; is the reset circuit time constant. Solving (5) for t results in 


V 
=—R,C, nf a (6) 
Setting the following: 
Ry = 100 kQ 
Cy = 4.7 uF 
Veco =5V 
V=aVy=1.5V 


results in t = 167 ms. Therefore, the reset circuit of Figure 13-27 provides 
a low pulse long enough to ensure the stabilization of the system oscillator 
upon powerup. } 


Hardware Applications 


Note that if synchronization of multiple 'C40s is required, all processors 
should be provided with the same input clock and the same reset signal. Af- 
ter powerup, when the clock has stabilized, all processors may then be syn- 
chronized by generating a falling edge on the common reset signal. Since 


it is the falling edge of RESET that establishes synchronization, RESET 
must be high for at least ten H1 cycles initially. Following the falling edge, 
RESET should remain low for at least ten H1 cycles and then be driven high. 
This sequencing of RESET may be accomplished by using additional cir- 
cuitry based on either RC time delays or counters. 
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Chapter 14 


TMS320C4x Signal Descriptions and 


The sections in this chapter cover the following characteristics of 


the TMS320C4x: 

Section Page 
14.1. Pinout and Pin Assignments ............. 0.0 e ee ee ees 14-2 
14.2 Signal Descriptions .............. ccc ce eee eee eee 14-7 
14.3. TMS320C4x Mechanical Data....................085. 14-11 
14.4 Electrical Specifications Sanene epee ns arere a ae eaeie ein aa era ees 14-12 
14.5 Signal Transition Levels ......... 0... cece eee eee 14-14 


146 “TIMING s2ibesctceedccetesteckectiewsns teeta eink eons 14-15, 


f ——— 


Note: Advance Information 


Unless otherwise noted, this chapter contains advance information on new 
products in the sampling or preproduction phases of development. 
Characteristic data and other specifications are subject to change without 
notice. 
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14.1 Pinout and Pin Assignments 


Figure 14-1. 
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The TMS320C40 (TMS320C4x generation) digital signal processor is avail- 
able in a 325-pin grid array (PGA) package. The pinout of this package is 
shown in Figure 14—1. Pin assignments are listed in the following tables: 


Ci Table 14—1: 
C1 Table 14-2: Pins sorted by pin number (location on Figure 14—1) 


Pins sorted by signal name (alphanumeric listing) 


(J Table 14-3: Pins sorted by function, describing each (page 14-7) 


T™MS320C40 Pinout (Bottom View) 

© © © 2 © © © AR 
@ © © © © © AP 
2 © © ©@ © © © AN 
2 © © © © AM 
© © © © © © AL 
© © @ © AK 
© © © @ © © 
©@.© (Bottom View) ©_© AH 
© © © © © © AG 
© © © © © © AF 
© © © @ © © AE 
© © © © © © AD 
© © © © © © AC 
© © QO © | AB 
© © © © © ©] AA 
© © © ©. Y 
© © © © © © 

© © © © V 
© © © @ © © U 
© © @ © i 
© © © @ © © R 
@ © @ © P 
2 © © © © © N 
© © © © © © M 
© © © © ©@O] t 
© © © © © © K 
© © © @ © © J 
2 © © @ © © H 
2 © © © © © G 
© © © © © F 
© © © @ © © E 
© © © © © D 
© 62 oo o° . 
2 © © GOO A 


10 12 14 16 18 20 22 24 26 28 30 32 34 


11.13 15 17 19 21 23 25 27 29 31 33 35 
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Table 14-1. | TMS320C40 Pin Assignments Sorted by Signal Name 
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Table 14-1. | TMS320C40 Pin Assignments Sorted by Signal Name (Concluded) 


STRBO 
STRB1 


LSTAT1 
| LSTAT2 
LSTAT3 


LSTRBO | AJ3 
Iwi [Als 
PAGEO AG33 
PAGE1 AB32 
RDYO 

ESET 


X2/CLKIN AA1 


< 
G2 


2 
RDY1 W31 
RESETLOCO AF30 
RESETLOC1 AH34 


As 
AK4 
AF32 


AC31 


R/WO 
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Table 14-2. _TMS320C40 Pin Assignments Sorted by Pin Number 


RESETLOCO 
R/Wo 


DVss 
LPAGE1 


TCLK1 
LDE 
LSTRB1 
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Table 14-2. | TMS320C40 Pin Assignments Sorted by Pin Number (Concluded) 


IACK 
LSTAT3 
RDY1 
LOCK 
TMS 
LSTAT2 
LSTAT1 
RDYO 
TCK 
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14.2 Signal Descriptions 


This section gives signal descriptions for the TMS320C40 device. 
Table 14-3 lists each signal, the number of pins, function, and operating 
mode(s), I.e., input, output, or high-impedance state as indicated by I, O, or | 
Z. All pins labeled NC are not to be connected by the user. A line over a sig- 

nal name (e.g., RESET) indicates that the signal is active low (true at a logic 

0 level). The signals are grouped according to function. | 


Table 14-3. TMS320C40 Signal Descriptions 


(“Signal [Pins [Typet[ ——~—~=C*eseription SS 
[a2 [VO | 22-bit dataport ofthe global extemal interlace 
1 | 1 [Data bus enabie signal forthe global external interface 
mE : 


Po Address bus enable signal for the global bus interface 
| O | Status signals for the global bus interface 


LOCK | O Lock signal for the global bus interface 


sa 

OZ 
[1 [1 | Ready signal or STABO accesses SSS 
T+ [1_| Control enabe for the STRBO, PAGEO, and RMWo signals 
. ; ae 
STRE 
<a STRE 
Wis 


STRBO t 


Access strobe 0 for the global bus interface 


O/Z | Page signal for STRB1 accesses 
|_| | Ready signal forSTRB1 accesses 
|__!__| Control enable for the STRB1, PAGE1, and RAW1 signals 


T STRBO and STRB1 and associated signals (R/W1, R/WO, PAGEO, PAGE1, etc.) are effective over the ad- 
dress ranges defined by the STRB ACTIVE bits, as listed in Table 7-3 on page 7-8. 


+ | = input, O = output, Z = high impedance. 
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Table 14-3. | TMS320C40 Signal Descriptions (Continued) 


| Signal | Pins |Typet} SSC scription = 
Local Bus External Interface (80 pins) 
TED(SI=0) [82 [VOI | s2-bit data port ofthe local extemal interlace 
TEBE | 1 | 1 | Data bus enable signal for the local extemal interace 
ny: 1 [Asides bus enable signal forthe local bus interface 
+ [0 | Status signals forthe local bus interface 


i 


‘il 


LSTAT(3—0) 


om Lock signal for the local bus interface 

Access strobe 0 for the local bus interface 

Read/write signal for LSTRBO accesses 
Z 


Page signal for LSTRBO accesses 


Ol DIONE 
| S| if 9 
G) DIO 
m}C}]OEx 
© © 
=p 


Ready signal for LSTRBO accesses 


aa 
m 
oo) 


/ 
/ 


I 
Y 
= 
me 
a 
—- 


ame (STAB 
[1 [control enable for the LSTRBO, LPAGED, and LANVO signals _ 


hd Ready signal for LSTRB1 accesses 
| 1 | Control enable for the LSTRB1, LPAGE1, and LR/W1 signals 
Communication Port 0 Interface (12 pins) 


(7-0) | 8 
CREGO 
CACKO vO 
CSTRBO 
CRDYO 


I 
ay 
; 


i; 
> 
G) 
BL, 


Lae 
LRDYO 
LCE 

w 
LRDY1 
LCE 


oO 
O 


Communication port 0 data bus 


/(O | Communication port 0 data strobe signal 
Communication port 0 data ready signal 


Tt LSTRBo and LSTRBI1 and associated signals (LR/W1, LR/W0, LPAGEO, LPAGE1, etc.) are effective over 
the address ranges defined by the STRB ACTIVE bits, as listed in Table 7-3 on page 7-8. 


+ |= input, O = output, Z = three-stated (high impedance). 


< 
© 
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Table 14-3. | TMS320C40 Signal Descriptions (Continued) 


(—Sional [ins [fet] Deseinion rr 


[eID | 8 | VO | Communication port tdatabus SSS 

TEAGK: | + | vO | Communication port 1 token request acknowledge signal 

[e2n7—0) | 8 | VO | Communication port? databus SSCS 

TeREG2 | 1 | vO | Communication port2 token requestsignal 
ACK2 


CACK VO | Communication port 2 token request acknowledge signal 
CSTRB2 I/ Communication port 2 data strobe signal 

Le | Communication port 2 data ready signal 
Communication Port 3 Interface (12 pins) 
Communication port 3 data bus 
Communication port 3 token request signal 
Communication port 3 token request acknowledge signal 
Communication port 3 data strobe signal | 
Communication port 3 data ready signal 
mmunication Port 4 Interface (12 pins) 
Communication port4 databus __ 
Communication port 4 token request signal 


Elk 
mi! O 
OWN 
OT | 
iS 


gS 


5|6 


?) 
J 
mM 
2) 
K 


Communication port 4 token request acknowledge signal 
Communication port 4 data strobe signal 
Communication port 4 data ready signal 

ommunication Port 5 Interface (12 pins) 

Communication port 5 data bus 

Communication port 5 token request signal 
Communication port 5 token request acknowledge signal 
Communication port 5 data strobe signal | 

VO | Communication port 5 data ready signal 


016 


BEER 

[~) O 

Pi is 
= | | 

a8 


© 
ms) 
m 
2) 
on 
O 


gg 


} 


© 
a8 
O 
< 
O1 


+ |= input, O = output, Z = three-stated (high impedance). 


— 
1 
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ssasebeecaasaebegaasesescsanatseenaseescsananaensseseanaeceaeatececeasaesecasessasessaegenanorsasanataasasanatcasasateeageaeeaeaasagoraeeusaeseecacentueatorencagueneaatatarenecgatnegatetataeginatatatecautanataceatceneatsieanataeesusatonenoatunateasctoestatetomeatatetatetatettatasatotntottatatetsatatetoatatatenetsetatetettatenateetatetettetetetetettasesenets 


Table 14-3. | TMS320C40 Signal Descriptions (Continued) 


Interrupts, I/O Flags, Reset, Timer (12 pins) 


IIOF(3 — 0) VO | Interrupt and /O flags 
Nonmaskable interrupt. It is sensitive to a low-going edge. 


| = 
aie 
AIX A 
— | © 


> 


: 
RESETLOO(I.O 
ROMEN 
x 
XOIGLKIN 


Emulation (7 pins) 


OT [JTAGtestpor data’ ——SSCSC~*™ 
(0 


three-stated (high impedance). 


-|- 


H3 


th. 


- 


+ |= input, O = output, Z 
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14.3 TMS320C4x Mechanical Data 
Figure 14-2. TMS320C40 325-Pin PGA Dimensions 


romoerwz2z2a4cs 
nonrox2ezvrsa.é< 


1.860 £0.019 


1.700 £0.017 
0.050 Typ 


© © © 
O50? 
© ©. © 
© © O65? 


66 © © © © © 


© © © © © © © © © © 


oo 


© 
© 
© © © © O00 OO 09 0 © © 


© © © © OO 0 0090 6 069 20 © 
©@©OO@DOOODOOODODOOODOD 


© 

© 
©. ©.© 
6°06 


© 
© 
© 
© 
© 


2 4 6\8 10 12 14 16 18 20 22 24 26 28 30 32 34 


1 3 5 79 11 13 15 17 19 21 23 25 27 29 31 33 35 
0.048 Stand-Off 
Pin 4 Places 0.020 Ref x 45° 
0.040 Ref X 45° 3 Places 
Pin A1 


(Top View) 


Index Mark Pin Ai 


0.120 +0.012 
ea 0.180 Typ 


0.018 +0.002 a 


0.005 radius Typ 


Notes: Dimensions are 


CAAA 


—<—_ 


= 0.016 £0.010 


0.050 Typ 


in inches. 


Package designator: GF. 
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14.4 Electrical Specifications 
| Table 14-4. Absolute Maximum Ratings Over Specified Temperature Range 


_ Operating case temperature range 0°C to 85 °C 
Storage temperature range -— 55°C to 150 °C 


Notes: 1) Stresses beyond those listed under “Absolute Maximum Ratings” may cause permanent 
damage to the device. This is a stress rating only; functional operation of the device at 
these or any other conditions beyond those indicated in the “Recommended Operating 
Conditions” table of this specification is not implied. Exposure to absolute-maximum- 
rated conditions for extended periods may affect device reliability. 


2) All voltage values are with respect to Vss. 


14 


Table 14-5. Recommended Operating Conditions 


Parameter iin Nom 
vss Supply votages (Cvgg.ete) ——S~dPSC 


Cont 
2.6 


Pion Hitvlevelouputourent SST SSSSCSCSCS~S RC 
ion Lowseveloutputounent Sid SSCS 
7. Operating feeartenperanre —~SC«tSC‘iaSCSSSSSCi 
VTH_ CLKIN high-level input voltage for CLKIN 


Note: Note 1 for Table 14—4 also applies to this table. All inputs and output voltages are 
TTL compatible. 


5.25 
08 - 
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Table 14-6. Electrical Characteristics Over Specified Free-Air Temperature Range 


Electrical Characteristic Min Nom(Note 1) Max 

Vow High-level output voltage ( Vpp = Min, low = Max) P24 BB 
-10 10 
_ —400 20 


VoL Low-level output voltage ( Vpp = Min, lo. = Max) PB 


jp Input current ( Inputs with internal pull-ups) (See 
Note 4) 


Icc Supply current (Ta =25 °C,Vpp = Max, fy = Max) 350 850 


Notes: 1) All nominal values are at Vpp = 5 V, Ta = 25 °C. 
2) fy is the input clock frequency. The maximum value is 50 MHz. 
3) All input and output voltage levels are TTL compatible. 
4) Pins with internal pull-up devices: TDI, TCK. 
5) Pin with internal pull-down device: TRST. 


Figure 14-3. Test Load Circuit 


Output 
©o Under 
Test 


Tester Pin vy 
Electronics Load 


Where: = I¢y_ = 2.0 mA (all outputs) 
lon = 300 BA (all outputs) 
Vioad = 2-15 V 
Cr = 80 pF typical load circuit capacitance. 


__Flectrical Specifications 


iv i 
F 
pF 


| Unit_ | 
m 
m 
ou 
m 
p 
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Signal Transition Levels 
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14.5 Signal Transition Levels 


TTL-level outputs are driven to a minimum logic-high level of 2.4 volts and 
to a maximum logic-low level of 0.6 volt. Output transition times are speci- 
fied as follows. 


For a high-to-low transition on a TTL-compatible output signal, the level at 
which the output is said to be no longer high is 2.0 volts, and the level at 
which the output is said to be low is 1.0 volt. For alow-to-high transition, the 
level at which the output is said to be no longer low is 1.0 volt, and the level 
at which the output is said to be high is 2.0 volts. 


Figure 14-4. TTL-Level Outputs 


Transition times for TTL-compatible inputs are specified as follows. For a 
high-to-low transition on an input signal, the level at which the input is said 
to be no longer high is 2.0 volts, and the level at which the input is said to 
-be low is 0.8 volt. For a low-to-high transition on an input signal, the level 
at which the input is said to be no longer low is 0.8 volt, and the level at which 
the input is said to be high is 2.0 volts. 


_ Figure 14-5.  TTL-Level Inputs 
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14.6 Timing 


Figure 14-6. X2/CLKIN Timing 
-~~—_—_—— (5) ————»| 


— Timing parameter table on next page — 


Figure 14-7. H1/H3 Timing 


| — (6) | 
el je (9) | | | 
Ht | : mer i 
| 7, 
=. — (9.1) | | 7) 
| eye (0.1) 
H3- \ / \ 
| | aaa © 
| 
9 
econ (7) asl el 
| 


— Timing parameter table on next page — 
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Timing 


TMS320C40 TMS320C40-40 


CLKIN fall time 


t, CLKIN low pulse duration 
w(CIL) te(C}) = min | 
: CLKIN ft pulse duration 


Ce CT I I 
TO Tay | wimotatime Cr 
Cr 
er a 


Delay from H1(H3) low to 
| (10) [toy __|_H1/H3 cycle time ee ee a ae 


T P= tg} as shown in Figure 14-6. 
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Figure 14-8. | Memory ((L)STRB = 0) Read 


| 
| 
(L)A | X : x | 
| | ey L-) 
| | j—we| face (6) 


Table 14-8. Timing Parameters for a Memory (L)STRB = 0) Read/Write 


| (6) | th L)D)R 


Note: For consecutive reads, (L)R/W stays high and (L)STRB stays low. 


— Table continued on next page — 


TMS320C40 | 1MS320C40-40 


Timing 
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Timing ; 
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Figure 14-9. Memory ((L)STRB = 0) Write 


eS Na NS ST OST TF OT 


Table 14-8. Timing Parameters for a Memory ((L)STRB = 0) Read/Write (Concluded) 


| TMS320C40 TMS320C40-40 


td(H1 H-(L)RWH) | H1 high to (L)R/W high (write) pt 
(10) (L)Dvalid after Ht low(write) | M6 | 1 


(L)D hold time after H1 high” 
(12) H1 high te to A valid on back-to- 
Note: The delay for (L)RDY to become active after the address is valid should be a maximum of 
13 ns for the 'C40 and 19 ns for the ’'C40-40. | 
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Figure 14-10. DE, AE, and CE Enable Timing 


>| te (1) | be (2) 


H-Z 7 
(yo0-81) > 


= ie (5) oar pal 


_— | HI-Z 
i, CE TS a acaaacerenag 2,, SIE 


7 hb i 8 
= (7) a | te (8) 


(L)STRB(O,1) | 
=f be (9) | = (10) 


tyPa@:@39), > 


Table 14-9. DE, AE, and CE Enable Timing 


| TMS320C40 | TMS320C40-40 | 


(1) [tapeH-pz) | Time (DE high to (LyD(-ByyHEZ_| OE | O18 
ee 
2 
{) |tacen-rwz) | Time (CE high to (LyRAWO.1) HZ | O18 | 18 
| 
as 
Ue eee 


-_~ 


~~, 


6) | ta(CEL-RWV Time (L)CE low to (L)R/W(0,1) valid 
” Time (L)CE high to (L)STRB(0,1) in 
'd(CEH-STRBZ) high impedance state 
(8) Time (L)CE low to (L)STRB(0,1) valid 


| (9 Time (L)CE high to (L)PAGE(0,1) 0 45 | ae 
| 'd(CEH-PAGEZ) in high impedance state 
(10) | tCEL-PAGEV) | Time (L)CE low to (L)PAGE(0,1) valid O15 


Timing 
SOV rageessiaroessanepaserecanasasesgsahyesaiecasesnserennsonesessoncececadesanecnsaraonececaiacaarearenar eens RaTnSetn a aindnataatalnintn’aiisteain atalntatnts ata tate etna’ Cacalnaiapalalatacnia'sta'a np stately’ Y Y satgtats 


Figure 14-11. Timing for (L)LOCK When Executing LDFl or LDII 


LDF or LDIl 
external access 


! 7 
(L)LOCK \ 


Table 14-10. Timing Parameters for (L)ILOCK When Executing LDFl or LDII 


Name Description 


a ee 
td(H1L-LOCKL) | H1 low to (L)LOCK low Se 
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Figure 14-12. Timing for (L)LOCK When Executing a STFI or STII 


STFI or STII 
external access 


(DLOCK © | 7 


Table 14-11. Timing Parameters for (L)LOCK When Executing STFI or STII 


TMS320C40 | TMS320C40-40 


t(HIL-LOCKH) | Ht low to (LOCK high ae 7 REE £8 
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Figure 14-13. Timing for (L)LOCK When Executing SIG/ 


(L)STAT(3—0) X x 


Table 14-12. Timing Parameters for (L)LOCK When Executing SIGI 


= es 
Description 
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we 


Figure 14-14. Timing Parameters for (L)PAGE(O, 1) 


Motataetetatretnertaetet 


A 


(L)R/W1 


| 
(L)STRB1 | | | | | | | | | 


(RADY | 


eSersssoenteaneroreeeceessoenanresentenenneetseroenteetrenaetstteneeteeteteamtstuseetaensssnnetantetessseseitettet etesteeen paiterernnnrrntate nanan eure atestnneeane eterna mea eae esopeseeten 
2 X % oat hatatadetavvatstphetes rercal SORES Ss on Se REBORN nee PT netotes 
ROKR KKK RK RAH RRR MRR RK SRR AM MMAR MIR N OMI redmrasocecateceargnconentenssned RR eenesaegonececenneecoresocesesonee satatetasetetatararocecerareconeens tut te0s ores scar esenbetontant ost, °S 


(L)D31—DO 


| : 
(L)STATS—STATO 


‘Table 14-13. Timing Parameters for (L)PAGE(0, 1) During Memory Accesses to a Different Page 


TMS320C40 | TMS320C40-40 
Description 
(1) at L-PH) | H1 low to PAGE high for access to 
different page 
(2) | tas H1 low to PAGE low for access to 
(H1L-PL) | different page _ 
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Figure 14-15. Timing for Loading IIF Register (IIOF Pins) When Configured as an Output Pin 
| Fetch Load | 
| Instruction = | Decode | Read | Execute | 


ECCCCOCCCOE 

KOO 1or0 

FLAG Bit XXKXXKXKXKKKK KK | 
| — j= (1) 


OF Pin | x 


Table 14-14. Timing Parameters for Loading lIF Register When Configured as an Output Pin 


as TMS320C40-40 


tV(HIL-IF H1 low to IIOF valid a a 
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Figure 14-16. Change of I'OF From Output to Input Mode 


Buffers Go 
Execute from Ouput _ Synchronizer value on Pin 
- | Load of IOF | | toinput | Delay | Seen inioF 
| | | | 
| | | | 
| | | | 
H1 | | | | 
| : | | | 
| | at i (2) 
TYPE | | | 
Bit —o ee (3) | 
AAK 


FLAG Bit al — 
Sampled — 
Data 


Seen 


OF Pin Output SRE OSS ARR 
| 


Table 14-15. Timing Parameters of IIOF Changing From Output to Input Mode 


| | TMS320C40 | TMS320C40-40 
Description | 


th(HiL-IFot) _|_HOF hold after H1 low 7 as re 
OF setup before Hi low es a ee 
OF hold after H1 tow an eS ee 
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Figure 14-17. Change of IIOF From Input to Output Mode 


Execution of 
| Load of IOF | | 


| : 
TYPE : ! 
Bit | | | 


a: oe 


Table 14-16. Timing Parameters of I'OF Changing from Input to Output Mode 


ee fn TMS320C40-40 | 
Description 


H1 low to IIOF switching from input 
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Figure 14-18. RESET Timing 


—™ F(i) | 
" heath —Ehewenmeenet lll tS 
(Notes 5, 6) (2.1) “TF Ie (2.2) = -- (3) 
41 | | | | 

a 10 H1 Clock Cycles | 

y se -———28 
(Note 1) “| ©) | 

oo) — 


(L)A ¥;4°0°4'4°4°4'4;0°0°6'0°0,0'0°0°4" 
(Note 2) RXXKKKRKKKK KKK KKK 


(7) 
Control Sig- . eVaracaaaaaaateca cea 
! TUOUH OHH YH 010440494, 0.0.000 
nals (Note 3) ! | relatetelatelelatetatatetatetatelatetetets 
== ee 


7 
(L)PAGE(0,1) [ a 
(Note 3) ee a ea ee 


i$ 
nS (9) a NOTE: Timing parameters are in Table 14—17 
Asynchronous Reset . on the next page. 
Signals (Note 4) ; 


es (10) 
Asynchronous ee ee 
Signals (Note 5) oe 
Notes: 1) (L)D includes D(31 — 0), LD(31 — 0), and CxD(7 — 0). 

2) L(A) includes A(30 — 0). 

3) raters bales LSTRBO, LSTRB1, STRBO, STRB1, (L)STAT(3 — 0), (L)LOCK, (L)R/WO, and (L)R/W1 go high while (L)PAGEO, and 
(L) 1 go low. 

4) Asynchronously reset signals that_go into high impedance after RESET goes low include TCLKO, TCLK1, IIOF(3—0), and the 
communication port control signals CREQx, CACKy, CSTRBy, and CRDYx (where x = 0, 1, or 2, and y = 3, 4, or 5). (At reset, ports 0, 1, 
and 2 become outputs, and ports 3, 4, and 5 become inputs.). 

5) Asynchronously reset signals that go to a high logic level after RESET goes low include CREQy, CACKx, CSTRBx, and CRDYy (where 
x = 0, 1, or 2, and y = 3, 4, or 5). 

6) RESET is an asynchronous input and can be asserted at any point during a clock cycle. If the specified timings are met, the exact sequence 
shown will occur; otherwise, an additional delay of one clock cycle may occur. 


Control Signals 


Lev 


aonh, 


> 


ee <a 
Description 


Setup for RESET before 


5 : td(CLKINH-H1L CLKIN high to H1 low ee 


Setup for RESET high 
tsu(RESETH-H1L) before Hilow and after 
10 H1 clock cycles 


(4-1) | ta(CLKINH-HSL CLKINhightoHstow | S12 | S13 | ng 
(4-2) | ta(CLKINH-HSH CLKINhightoHshighn |S 2 | S18 


H1 high to (L)D high-im- 
tdis(H1H-XD) pedance 
: H3 high to (L)A 
ahebkae Ame high-impedance 
H3 high to control signals 
'd(H3H-CONTROLH) | high (low for (L)Page) 
ta(H1H-IACKH Hi hightoTACKhigh | 


RESET low to asynchron- 


i Oaneen es) ously reset signals high- 
pee ace 


~~ 
~—— = 


= to(ct), the CLKIN period as shown | in Figure 14-6. 
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Figure 14-19. IOOF(3— 0) Interrupt Response Timing 


Reset or Fetch First 
Interrupt Instruction of 
| Vector Read | | Service Routine | 


| 
| 
—m t«- (1) See Note 3 | 


IOF(3-0) \ 
Pin 


NOF(3-0) | | | 
Flag 
| 
| 


| Vector First 
WDA —K gist 


| | 
| | : 
| | | | 
+. (2) ——> | ! ! | 
| | | | 
| | | 
| | | 
| | | 
| | | 


Table 14-18. Timing Parameters for IOOF(3— 0) 


TMS320C40 TMS320C40-40 
Description Min Typ Max |Min Typ Max 


lIOOF(3 — 0) setup before H1 
© [enscon [EPPMmmMmERT Te | 
twilOOF Interrupt pulse width to 
(Sos Note 1) guarantee one interrupt seen P 1.5P <2P P = 1.5P <2P 


Notes: 1) Interrupt pulse width must be at least 1 P wide (P = one H1 period) to guarantee it will 
be seen. It must be less than 2 P wide to guarantee it will be responded to only once. 
Recommended pulse width is 1.5 P. | 

2) IOOF is an asynchronous input and can be asserted at any point during a clock cycle. 
Ifthe specified timings are met, the exact sequence shown will occur; otherwise, an addi- 
tional delay of one clock cycle may occur. 

3) The ’C40 can accept an interrupt from the same source every two H1 clock cycles. 


4) For edge-triggered interrupts, only timing number (1) applies. 
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Figure 14-20. IACK Timing 


| Fetch IACK | | | IACK Data | | 


Instruction Read 
H3 
= NST YSN SVS YS 
(1) —~> ha — | 
| >| 2) 


ADDR | x x 
DAT ean, 7 ame 


| Table 14-19. Timing Parameters for ACK — 


ae ee ee 
Description | 
t/HIH-IACKL) | H1 high to TACK low 
(2) H1 high to [ACK high during first 
'd(H1 H-IACKH) cycle of [ACK instruction data read | 


Note: The IACK output is active for the entire duration of the bus cycle and is therefore extended 
if the bus cycle utilizes wait states. 
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Figure 14-21. Communication-Port Word- Transfer Cycle Timing — 


Note: For correct operation during token exchange, the two communicating 'C40s must have CLKIN 
frequencies within a factor of 2 of each other (in other words, at most, one of the ’C40s can be 
twice as fast as the other). 


Table 14-20. Communication-Port Word- Transfer Cycle Timing 


TMS320C40t -TMS320C40-40t 
Description Mint Maxt Mint Maxt 


(1) |tworp Hinds Rar 15P+46 25P+202|1.5P+46 2.5P +202 


t 
CRDY low to CSTRB low | 
( td(RL-SL)W | between back-to-back write 1.5P+7 2.5P + 28/15P+7 2.5P +28 


cycles 


t P is the duration of the H1 clock period with a minimum value of 40 ns (P = 40 ns). 
+ For these timing values, it is assumed that the ’C40 receiving data is ready to receive data. 
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Figure 14-22. Communication Port Byte Timing (Write and Read) 


S—— 6) 
CSTRB | 


l 
leas he-wl- (2) | 


CD(7-0) 
rc (3)—>| | 
| | 
CRDY 
| | 
—| (4) ~<—— 
(a) Write Timing — (b) Read Timing 


Table 14-21. Communication Port By Timing (Write and Read) 


c—— oe asencen-0 
Esse anil 


Tike eae 

L® |tapi-styw |CRDViowtoCSTRBhigh(wrtey | 3 1S | 3 18 | ns | 

|) |tcpyw | CDhold after CRD tow (write) | 2 | ns 

fama [Seeamtarme™ [2 [2 S| 
quent bytes (write) 

Byteperiod = period 

ee a 


(8) : CD)R CD held valid after CRDY low (read) Te SA 5" 


(9) TtasHeRHyR | CSTRB high to CRDY high (read) 
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Figure 14-23. Communication Token Transfer Sequence From an Input to an Output Port 


! (42) 


ie When signal is an input (clear = when signal is an output). 


Note: Before the token exchange, CREQ and CRDY are output signals asserted by the ’C40 
that is receiving data. CACK, CSTRB, and CD(7-0) are input signals asserted by the 
device sending data to the ’C40; these are asynchronous with respect to the H1 clock of 
the receiving ’'C40. After token exchange, CACK, CSTRB, and CD(7—0) become output 
signals, and CREQ and CRDY become inputs. 


— Timing parameter table on next page — 
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Table 14-22. Communication Token Transfer Sequence From an Input to an Output Port 
(Figure 14-23) 


7 | | TMS320C40 

_ Name Description Mint Maxt 
CACK low to CSTRB 

(1)T | td(AL-SO)T 


change from input to a high- 
td(AL—RQH)T 


0 


0.5P+6 1.5P+22 | 0.5P+6 1.5P+ 22 


level output 


CACK low to start of CREQ 
going high for token request 
acknowledge 


Start of CREQ going high to 
CREQ change from output to 
an input 


Start of CREQ going high to 
CACK change from an input 
to an output level high 


0.5P+13 
Start of CREQ going high to 
CD(7—0) change from inputs 


0.8P-5 0.5P+13 
driven to outputs driven 
Start of GREQ going high to | | 
(4.2) td(RQH-RI)T | CRDY change from an out- 0.5P+13 | 0. 0.5P+13 


P+5 2P+20] P+5 2P + 20 


0.5P+13 0.5P+ 13 


0.5P+13 


0.5P+13 


Start of CREQ going high to 
CSTRB low for start of word 
transfer out 


put to an input 
0.5P-8 1.5P+9 


3.5P+12 5.5P+48 | 3.5P+12 5.5P+ 48 


CRDY low at end of word 
input to CSTRB low for word 
output : 


Tt These timing parameters result from synchronizer delays and are referenced from the falling 
edge of H1. The inputs (that cause the output-signal pins to change values) are sampled on H1 
falling. The minimum delay occurs when the input condition occurs just before H1 falling, and the 
maximum delay occurs when the input condition occurs just after H1 falling. 
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Figure 14-24, Communication Token Transfer Sequence From an Output to an Input Port 


Valid data 


CRDY a CRDY AN OUTPUT 


= When signal is an input (clear = when signal is an output). 


Note: Before the token exchange, CACK, CSTRB, and CD(7-0) are asserted by the 'C40 
sending data. CREQ and CRDY are input signals asserted by the 'C40 receiving data 
and are asynchronous with respect to the H1 clock of the sending ’'C40. After token ex- 
change, CREQ and CRDY become outputs, and CSTRB, CACK, and CD(7-0) become 
inputs. | 


— Timing parameter table on next page — 
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Table 14-23, Communication Token Transfer Sequence From an Output to an Input Port 


(Figure 14-24) 
| TMS320C40 TMS320C40-40 | 


CREQ low to start of CACK 
(1)T | ta(RQL-AL)T going low for token request P+5 2P+22 | P+5 2P + 22 
acknowledge 
CRDY low at end of word 
(2)T | ta(RL-AL)T transfer out to startof CACK | P+6 2P+27| P+6 2P+27 
going low 
Start of CACK going low to 
(3) | ta(aL-cb)! CD(7-0) change from 05P-8 0.5P+8]05P-8 0.5P+8 
outputs to inputs 
Start of CACK going low to 
(4) | ta(AL-RO)T CRDY change from aninput | 0.5P-8 0.5P-810.5P-8 0.5P-8 
to output, high level 
CREQ high to CREQ 
(5)T | ta@RQH-AQ)T | change from an input to 
output, high level 
| (6)t CREQ high to CACK change 
'd(RQH-Al)T from output to an input 
CREQ high to CSTRB 
(7)T | tg(RQH-SI)T __ | change from output to an 
input 
(8 | CREQ high to CREQ low for a oP 4.8 -4 2P48 


Tt These oe parameters result from synchronizer delays and are referenced from the falling 
edge of H1. The inputs (that cause the output-signal pins to change values) are sampled on H1 
falling, The minimum delay occurs when the input condition occurs just before H1 falling, and the 
maximum delay occurs when the input condition occurs just after H1 falling. 


14-36 TMS320C4x Signal Descriptions and Electrical Characteristics 


Figure 14-25. Timer Pin Timings 


ws _S”\_S\_S\_S\_/\_/"\_/~ 


m NST NAS NS NGA NGA NSN 


fjee- (2) > 3) = 
hel as oI 1 + 6) 


| 
Peripheral Pin ~~ 


Table 14-24, Timing Parameters for Timer Pin 
TMS320C40 | TMS320C40-40 


(2) | thcTCLKHiL)_| TCLK hold after H1 low rs a ee Fe 
(3) [tacroukHiHy | TCLK valid after H1 high pT ns 


Note: Period and polarity of valid logic level are specified by contents of internal control registers. 
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| 
| je (1) > 


TMS/TDI 7 Yt x | 
| | (2) 


TMS320C40 | TMS320C40-40 
Description | Min = Max | 
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This appendix describes sockets available to accept the TMS320C4x pin 
grid array (PGA). Both sockets covered in this appendix feature zero inser- 
tion force (ZIF): 


Q a tool-activated ZIF socket (TAZ) 
(3 ahandle-activated ZIF socket (HAZ). 


The sockets described herein are manufactured by AMP Incorporated®. 


A.1 Tool-Activated ZIF PGA Socket (TAZ) 
Figure A-1.  Tool-Activated ZIF Socket 


0.350 In. Max. 


This socket requires AMP™ actuator tool: 354234—1 


Description: 
AMP partnumber: 382533-9 
CL) pin positions: 325 


[ soldertail length: 0.170 in. for PC boards 0.125 in. thick (other tail 
| lengths available) 


Features: 

slightly larger than PGA device 

easy package loading because of large funnel entry 

zero insertion force | 

contact wiping action during insertion ensures clean contact points 
spring-loaded cover ensures proper loading 

can be used with robotic insertion and removal 


Ooodododd 


its horizontal socket forces (vs. vertical) prevent damage to device 


TMS320C4x Sockets 


A.2 Handle-Activated ZIF PGA Socket (HAZ) 
Figure A-2. _ Handle-Activated ZIF Socket 


2.700 in. Max. 


0.350 in. Max. 


a 
>. 
0.650 in. Max. — 
Description: 
( AMP part number: 382320-9 


) pin positions: 325 

{) solder tail length: 0.170 in. for pc boards 0.125 in. thick (other tail 

lengths available) 

{1 Dimensions: Height: 0.350 inch maximum to device 
plane and 0.650 inch to top of 
handle in closed position 

Width: 2.700 by 2.875 inches maximum 
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Features: 


= 


ooca 


C 


can be used for test and burn-in 

spring contacts are normally closed 

easy package loading because of large funnel entry 
zero insertion force 


contact wiping action during socket closing ensures clean contact 
points 


operating temperature is 160° C (burn-in capability) 


TMS320C4x Sockets 
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The information in this documentis assist you in meeting the design require- 
ments of the XDS510 emulator. This information supports XDS510 Cable 
no. 2563988-001, rev B. 


The TMS320C4x family supports emulation through a dedicated emulation 
port. The emulation port is a superset of the IEEE 1149.1 (JTAG) standard 
and can be accessed by the XDS510 emulator. For details on the JTAG pro- 
tocol, refer to the IEEE 1149.1 specification. 


This appendix contains the following sections: 


Section Page 
Bol: ‘SHCAGEr SIONANS ac. s.08- 533043 Ceca neocaeebuatseloniaatdms B-2 
6:2 BUS PIOQWCO!! s0dcbadd ian aawete orale sweeene< beeen eed’ B-3 
Bis Cable POG. cece tote cade Raw asewienedeaauaeen haere B-4 
-B.4 Test Clock Generated in Test System state aeere ened ae oe eas B-7 
B.5 Processor Configuration ........... 2.0 cc cee cc ee cence .... B-8 
B.6 Emulation Timing Calculations ................ 0.00 cee eee B-11 
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: n Considerations: = 


Header Signals 


-B.1 Header and Header Signals 


To perform emulation with the XDS510, your target system must have a 
14-pin header (two 7-pin rows) with connections as shown in Figure B—1. 
Table B—1 describes the emulation signals. 


Although you can use other headers, recommended parts include: 


Straight header, unshrouded DuPont Electronics™ part num- 
| — ber 67996—114 | 
Right-angle header, unshrouded DuPont Electronics™ part num- 


ber 68405-114 


Figure B-1. 14-pin Header Signals and Header Dimensions 


TMS TRST 
TDI GND Header Dimensions: 
PD (+5 V) No pin (key) Pin-to-pin spacing: 0.100 in. (X,Y) 
| TDO GND Pin width: 0.025 in., square post 
TCK_RET GND Pin eee 0.235 in., nominal 
TCK GND 


EMUO EMU1 


Table B-1._ 14-Pin Header Signal Description 


XDS510 |tXDS510|tTarget] ) | = | 
jtm™s | oO | I { JTAGtest mode select 


To | Ot | TAG testdata input 
TOO | th | | TAG testdataoutput 


JTAG test clock. TCK is a 10—-MHz clock source from the 
TCK emulation cable pod. This signal can be used to drive the 
system test clock. 


rast [0 | 1 | staGwestreset SSS 
remo | _1 | VO | Emulatonpino SSCS 
femur [1 | vO | Emulatonpint SSS 


Presence detect. Indicates that the emulation cable is con- 
nected and that the target is powered up. PD should be tied 
to +5 volts in the target system. 
TCK RET JTAG test clock return. Test clock input to the XDS510 
= emulator. May be a buffered or unbuffered version of TCK. 


T |= input; O = output 
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B. 2 Bus Protocol 


The IEEE 1149.1 specification covers the requirements for JTAG bus slave 
devices (such as the TMS320C4x family) and provides certain rules. Those 
rules are summarized as follows: 


(1 The TMS/TDI inputs are sampled on the rising edge of the TCK signal 
of the device. 


[J The TDO output is clocked from the falling edge of the TCK signal of the 
device. . } 


When JTAG devices are daisy-chained together, the TDO of one device has 
approximately a half TCK cycle set up to the next device’s TDI signal. This 
type of timing scheme minimizes race conditions that would occur if both 
TDO and TDI were timed from the same TCK edge. The penalty for this tim- 
ing scheme is a reduced TCK frequency. 


The IEEE 1149.1 specification does not provide rules for JTAG bus master 
(XDS510) devices. Instead, it states that it expects a bus master to provide 
bus slave compatible timings. The XDS510 provides timings that meet the 
bus slave rules and also provides an optional timing mode that allows you 
to run the emulation ata much higher frequency for improved performance | 
by avoiding the timing penalty dexcribed herein. 


B-3 


Cable Po 


B.3 Cable Pod 


Figure B—2 shows a portion of the XDS510 emulator cable pod. These are 
the functional features of the emulator pod: 


{3 Signals TDO and TCK_RET can be parallel-terminated inside the pod 
if required by the application. The default is that these signals are not 
terminated. 


[ Signal TCK is driven with a 74AS1034 device. Because of the high cur- 
rent drive (48 MA Io /Ioy), this signal can be parallel terminated. If TCK 
is tied to TCK_RET, then you can use the parallel terminator in the pod. 


[L} Signals TMS and TDI can be generated from the falling edge of 
TCK_RET, according to the IEEE 1149.1 bus slave device timing 
rules. They can also be driven from the rising edge of TCK_RET, which 
allows a higher TCK_RET frequency. The default is to match the IEEE 
1149.1 slave device timing rules. This is an emulator software option 
that can be selected when the emulator is invoked. In general, single- 
processor applications can benefit from the higher clock frequency. 
However, in multiprocessing applications, you may wish to use the IEEE 

~ 1149.1 bus slave timing mode to minimize emulation system timing con- 
straints. 


L} Signals TMS and TDI are series terminated to reduce signal reflections. 


[) A10—MHztestclock source is provided. You may also provide your own 
test clock for greater flexibility. 
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Figure B-2. Emulator Pod Interface 


TDO (Pin 7) 


GND (Pins 4,6,8,10,12) 


EMUO (Pin 13) 
EMU1 (Pin 13 


TCK_RET (Pin 9) 
PD (Pin 5) 


33 
TMS (Pin 1) 
74AS258 | ae 
TDI (Pin 3) 


74AS1034 
| >— TCK (Pin 11) 


74AS1034 


TRST (Pin 2 
74AS1034 nes 


ee, 


74AS1004 


Se 


CL 


Figure B—3 and Table B—2 show the signal timings for the XDS510. Timing 
parameters are calculated from standard data sheet parts used in the cable 
pod. These timings are for reference only. Texas Instruments does not test 


or guarantee these timings. 


The emulator pod uses TCK_RET as it’s clock source for internal synchroni- 
zation. TCK is provided as an optional target system test clock source. 


Lae 


Cable Pod 


Figure B-3. | Emulator Pod Timings 


TCK_RET _ 1.5V 
TMS TDI (Default) 


TMS TDI (Optional) 


TDO 


Table B-2. _ Emulator Pod Timing Parameters 


Re Retr [Bess a 


1 | rex | Tox RET period 95 200| ne 
| troKmax 


| 2 | trokhighmin | TCK_RET high pulse duration p15 | ns 
| 3 | trcKiowmin _| TCK_RET low pulse duration Mi ae 


cx TMS/TDI valid from TCK_ RET low (default timing) 
ta(XxTMXmax 


} 5 | Sacervsminy TMS/TDI valid from TCK_RET high (optional timing) 7 24] ns 
ta) XTMSmax | == 


| 6 | tsuxtDominy | TDO setup time to TCK_RET high 3 [ins | 
|_7 | thaoxrpominy | TOO hold time from TCK_RET high 120 fens 


It is extremely important to provide high-quality signals between the emula- 

tor and the target processor. If the distance between the emulation header 

and the processor is greater than 6 inches, the emulation signals should be 

buffered. Sections B.4 and B.5 illustrate typical connections between the 
_ target processor and the emulation header. 
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B.4 Test Clock Generated in Target System 


Figure 4 shows an application with the system test clock generated in the | 
target system. In this application the TCK signal is left unconnected. 


Figure B-4. _ Target-System Generated Test Clock 


Greater Than 


+5 V 
- ‘ : 


Emulator Header 


TMS320C4x 
EMUO EMUO 
ce ey Ge | Pe 
a) es 

en ee eee Pte 

TDI <j 4 TDI 

man aay 

TCK_RET 


System Test Clock 


There are two benefits to having the target system generate the test clock: 


1) You can set the test clock frequency to match your system require- 
ments. The emulator provides only a single 10—MHz test clock. 


2) You may have other devices in your system that require a test clock 
when the emulator is not connected. 


B.5 Multiprocessor Configuration 


Figure B-5. Multiprocessor Connections 


TMS320C4x TMS320C4x 


Figure B—5 shows a typical multiprocessor configuration. This is a daisy- 
chained configuration (TDO-TDI daisy-chained) that meets the minimum 
requirements of the IEEE 1149.1 specification. The emulation signals in this 
example are buffered to isolate the processors from the emulator and pro- 
vide an adequate signal drive for the target system. One of the benefits of 
a JTAG test interface is that you can generally slow down the test clock to 
eliminate timing problems. Several key points to multiprocessor support are 
as follows: 

[} The processor TMS, TDI, TDO, and TCK should be buffered through 
the same physical package to better control timing skew. 

{1 The input buffers for TMS, TDI, and TCK should have pullups to 5 volts. 
This will hold these signals at a known value when the emulator is not 
connected. A pullup of 4.7 kQ or greater is suggested. 

1 Buffering EMUO and EMU1 is optional but highly recommended to pro- 
vide isolation. These are not critical signals and do not need to be buff- 
ered through the same physical package as TMS, TCK, TDI, and TDO. 
Unbuffered and buffered signals are shown in Figure B-6 and 
Figure B—-7. 
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No signal buffering. In this situation, the distance between the header and 
the processor should be no more than 6 inches. 


Figure B-6. Unbuffered Signals 


_ 6 Inches or Less ms 
A +5V 


TMS320C4x 


GND 


Emulation signals buffered. The distance between the emulation header 
and the processor is greater than 6 inches. The emulation signals — TMS, 
TDI, TDO, and TCK_RET — are buffered through the same package. 


Figure B-7. Buffered Signals 


Greater Than 
a 6 inches aad 
AtS V 


TMS320C4x - Emulator Header 


EMUO 13 | Emuo 
EMU1 EMU1 
TAST TRAST 


TMS TMS 
TDI TDI 
TDO TDO 
TCK TCK 
TCK_RET 


[} The EMUO and EMU1 signals must have pullups to 5 volts. The pullup 
resistor value should be chosen to provide a signal rise-time less than 
10 ps. A 4.7 kQ resistor is suggested for most applications. EMU0 — 1 
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are I/O pins on the ’'C4x; however, they are only inputs to the XDS510. 
In general, these pins are used in multiprocessor systems to provide 
global run/stop operations. 


It is extremely important to provide high quality signals, especially on 
the processor TCK and the emulator TCK_RET signal. In some cases, 
this may require you to provide special PWB trace routing and use 
termination resistors to match the trace impedance. The emulator pod 
does provide optional internal parallel terminators on the TCK_RET, 
and TDO. TMS and TDI provide fixed series termination. 
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eometiulation Timing Calculations 


B.6 Emulation Timing Calculations 


Following are a few examples on how to calculate the emulation timings in 
your system. For actual target timing parameters, see the appropriate de- 


vice data sheets. 

Assumptions: tsucTTMS) = Target TMS/TDI setup to TCK high 10 ns 
thcttms) Target TMS/TDI hold from TCK high 5 ns 
tatTpo) _—iarget TDO delay from TCK low 15 ns 
td(bufmax) Target buffer delay maximum 10 ns 
td(bufmin) Target buffer delay minimum 1 ns 


(bufskew) Target buffer skew between two devices 
in the same package: 


[ta(bufmax) — td(bufmin)] x 0.15 1.35 ns 

tickfactor Assume a 40/60 duty cycle clock 0.4 

Given in Table B-2 (page B-6): | 
td(xTMSmax) XDS510 TMS/TDI delay from TCK_RET 


low, maximum 20 ns 
td(XTMX) min XDS510 TMS/TDI delay from 

TCK_RET low, minimum 6 ns 
ieviduad XDS510 TMS/TDI delay from TCK_RET 

high, max | 24 ns 
td(XTMXmin) XDS510 TMS/TDI delay from TCK_RET 

high, minimum 7 ns 


tsu(XTDOmin) [DO setup time to XDS510 TCK_RET 
high 3ns 


There are two key timing paths to consider in the emulation design: 
the TCK_RET/TDO (tprdtck_ TDO) path. 


In each case, the worst case path delay is calculated to determine the maxi- 
mum system test clock frequency. 
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Emulation Timing Calculations | ee 


Case 1: 


Case 2: 


Case 3: 
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Single processor, direct connection, TMS/TDI timed from TCK_RET low 
(default timing). 


tordtck_TMS apcrusman + oe / ttcktactor 


= Rapererien + 


75 hing (13.3 MHz) 


tordtck_TDO fe + ‘sugeTDOmind ! ttckfactor 


+ 3ns)/0 
a ns (22.2 MHz) 


In this case, the TCK/TMS path is the limiting factor. 


Single processor, direct connection, TMS/TDI timed from TCK_RET high 
(optional timing). 


lordtck_TMS = AaTMsmay) + bila 
| _ : 
= ve a "(29.4 MHz) 
tordtck_TDO = +) Yl + tsu(xTDOmin)] / ttckfactor 
‘a + 3ns)/04 
= 45ns (22.2 MHz) 


In this case, the TCK/TDO path is the limiting factor. One other thing to con- 
sider in this case is the TMS/TDI hold time. The minimum hold time for the 
XDS510 cable pod is 7 ns, which meets the 5—ns hold time of the target de- 
vice. | 


_ Single/multiple processor, TMS/TDI buffered input; TCK_RET/T DO buff- 


ered output, TMS/TDI timed from TCK_RET high (optional timing). 


tordtck_TMS = ta(xTMSmax) + feu(TTMS) 5 2 _— 
= 24ns + 10ns + 2 (10 


54 ns (18.5 MHz) 
tordtck_TDO = td(TTDO) + tsu(XTDOmin) + tbufskew 


lickfactor 


(15ns + 3ns + + 1.35ns)/0.4 
58.4 ns (20.7 MHz) 


In this case, the TCK/TMS path is the limiting factor. The hold time on TMS/ 
TDlis also reduced by the buffer skew (1.35 ns) but still meets the minimum 
device hold.time. 
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Case 4: 


Single/multiprocessor, TMS/TDI/TCK buffered input; TDO buffered output, 
TMS/TDI timed from TCK_RET low (default timing). 


tordtck_TMS = td(xT MSmax) + tsu(T TMS) * tbufskew 
tickfactor 


= (24ns + 10ns + 1.35ns)/0.4 


= 88.4 ns (11.3 MHz) Bo 


tprdtck_TDO = td(TTDO) + tsu(xTDOmin) + td(bufmax) 


lickfactor 


=(15ns + 3ns + #£410ns)/0.4 
= 70 ns (14.3 MHz) | 


In this case, the TCK/TMS path is the limiting factor. 


In a multiprocessor application, it is necessary to ensure that the EUM0O—1 
lines can go from a logic low level to a logic high level in less than 10 ps. This 
can be calculated as follows (remember that t = 5 RC): 


5(Roultup * Ndevices * Cload_per_device) 
5(4.7kQ x 16 x 15pF) 
5.64 us 


tise 
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Note: Primary sources are in boldface. 


A-law compression, expansion, 12-46 
adaptive filters, 12-58 
ADDC instruction, 12-41 
addition, floating point, 4-20 
address buses 
address reach (space), 1-5 
external, 2-27 
general, 1-5 
address range 
LSTRBO,1-field specified, 7-11 
STRBO,1-field specified, 7-10 
addressing modes | 
conditional branch, 2-15, 5-24 
general, 2-15, 5-19 
parallel, 2-15, 5-23 
_ three operand, 2-15, 5-20 
addressing types, 5-2 
direct addressing, 5-4 
immediate, 5-17 
indirect addressing, 5-5—5-16 
PC relative, 5-17 
register, 5-3 
ALU, 2-4 
analysis module 
general, 1-5 
registers, 3-21 
ANSI C compiler, 1-9 
applications 
hardware, 13-1 
list, 1-11 


software, 12-1 | 
ARAU (auxiliary register arithmetic unit), 2-6 
arithmetic logic unit (ALU), 2-4 
arithmetic operations, 12-28 
assembly language instructions, 11-1 
categories, 11-3—11-9 
interlocked operation, 11-7 
load and store, 11-3 
parallel operation, 11-8 
program control, 11-6 
three—operand, 11-6 
two—operand, 11-4 
condition codes, flags, 11-10 
example instruction, 11-18 
register syntax, 11-17 | 
summary, 2-16—2-25, 11-3—11-9 
symbols used to define, 11-14—11-17 
syntax options, 11-15-—11-17 | 
auxiliary register arithmetic units (ARAUs), 
2-6 
auxiliary registers (ARO—7), 2-6, 3-5 


Bcond instruction, 12-11 
BcondAF, BcondAT instructions, 6-8 
benchmarks, FFT timing, 12-88 
biquads, 12-53 
bit manipulation, 12-28 
bit-reversed addressing, 5-30, 12-31 
modify example, 5-16 
block diagrams 
communication port control register, 8-10 
communication ports, 8-4, 8-5 
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block diagrams (Continued) 
CPU, 2-5 
memory organization, 2-11 
peripheral modules, 2-28 
timers, 9-45 
TMS320C40, 2-2 
block moves, 12-29 
block repeat, 6-2 
example, 12-24 
registers (RS, RE), 3-14, 12-26 
block repeat registers (RS, RE), 3-14, 12-26 
block size (BK) register, 3-5, 12-52 
boot loader, 13-5 
communication port, 13-8 
external memory, 13-8 
source program, 13-14 
branches, 6-7, 6-9, 12-22 
delayed, 6-7 
BRD instruction, 12-95 
bus operation 
arbitration, 13-48, 13-70 
external, 2-27 
internal, 2-26 — 
busy-waiting example, 6-15 
byte manipulation, 12-30 


cache, optimization of code, 12-96 
cache memory, 2-10 
algorithm, 3-27 
architecture, 2-10, 3-25 
control bits, 3-29 
general, 1-6 
hit, 3-27 | 
instruction cache, 3-25 
miss, 3-27 
optimization of code, 12-96 
size, 1-6 
CALL instruction, 6-9, 12-13 
CALLcond instruction, 6-9, 12-27 
calls, 6-9 
example code, 12-9 
zero overhead, 12-11 
central processing unit, 2-4 
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channel control register. See DMA channel 
contro! register 
circular addressing, 5-25 
circular modify example, 5-12 
communication port arbitration unit (PAU), 
8-5 
communication port control register, 8-5, 
8-10 
field descriptions, 8-10 
memory map, 3-23 
communication ports 
applications, 12-98 
architecture, 2-29 
benefits, 1-7 
block diagram, 8-4, 8-5 
control register. See communication port 
control register 
features, 8-3 
general, 1-4 
memory map, 3-23, 8-8 | 
port arbitration unit (PAU), 8-12 
synchronizer timing, 8-32 
throughput, 1-4, 1-7 
timing, 8-18, 8-32, 14-31 
companding, 12-46 


‘compiler, 1-9 


computed GOTOs, 12-27 


_ condition codes, flags, 11-12 


conditional delayed branches, 6-7 — 
conditional—branch addressing modes, 2-15, 
5-24 | | 
context switching, 12-15 
conversion of format | 
2s complement floating-point to IEEE, 
4-13 
extended-prec floating-point to single- 
prec floating-point, 4-10 
floating point to integer, 4-28 
IEEE single prec. std. 754, 4-11 . 
IEEE std. 754,4-11 
IEEE to 2s complement floating-point, 
4-12 
IEEE to/from 'C40, 12-42 
integer to floating point, 4-30 


conversion of format (Continued) 
short floating point to extended-prec. 
floating point, 4-9 
short floating point to single-prec. floating 
point, 4-9 
single-prec. floating-point to extended- 
prec floating-point, 4-10 
single-prec. 2s compl. floating-point, 4-11 
counter example, 6-15 
counter register (timer), 9-50 
See also timers 
CPU 
architecture, 2-4 
buses, 2-26 
general, 1-4 
instruction cycle times, 1-4 
primary register file, 3-3 
throughput, 1-4 
CPU internal interrupt enable register (IIE), 
2-8, 3-10 
CPU primary register file, 3-3 
CPU registers, 3-3 
auxiliary (ARO—AR7), 2-6, 3-5 
block repeat (RS, RE), 3-14, 12-26 
block size (BK), 2-7, 3-5 
data page pointer (DP), 2-7, 3-5, 5-4 
DMA interrupt enable (DIE), 2-8, 3-8 
bit descriptions, 3-9 
extended precision (RO—R11), 2-6, 3-4 
IlOF flag register (IIF), 2-8, 3-12 
index (IR1, IRO), 2-7, 3-5 
internal interrupt enable (IIE), 2-8, 3-10 
bit descriptions, 3-11 
list of, 2-7, 3-3 
primary register file, 3-3 
program counter (PC), 2-9, 2-26, 3-14 
repeat count (RC), 2-8, 3-14, 6-2, 12-26 
repeat end address (RE), 3-14, 6-2 
repeat start address (RS), 3-14, 6-2 
reserved bits, 3-14 
stack pointer (SP), 2-8, 3-5 
application, 12-13 


status register (ST), 2-8, 3-5, 11-11 
bit descriptions, 3-6 


data buses 
external, 2-27 
general, 1-5 
transfer rate, 1-5 
data page pointer (DP), 2-7, 3-5, 5-4 
delayed branches, 6-7 
example, 12-23 
incorrectly placed, 6-6, 6-7 
optimization use, 12-95 
dequeues (stack), 5-33 
development tools. See software develop- 
ment tools | 
dimensions ('C40), 14-11 
direct addressing, 5-4 
direct memory access. See DMA coproces- 
sors | 
disabled interrupts by branch, 6-8 
displacements, 5-5—5-16 
division 
floating point, 12-33 
integer, 12-33 | 
DMA. See DMA coprocessors 
DMA channel control register, 9-7 
AUTOINIT STATIC bit, 12-105 
PRI bits, 9-14 
bit definitions, 9-8 
field descriptions, 9-8 
START bits, 12-103 
STATUS bits, 9-16 
SYNC MODE bits, 12-103 
TRANSFER MODE field, 9-28 
DMA coprocessors 
architecture, 2-29 
autoinitialization, 9-31, 12-105 
example, 12-107 
benefits, 1-8 
buses, 2-26 
channel address register, 9-16 
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DMA coprocessors (Continued) 
channel control register 
AUTOINIT STATIC bit, 12-105 
PRI bits, 9-14 
START bits, 9-15 
STATUS bits, 9-16 
SYNC MODE bits, 9-15 
TRANSFER bits, 9-14 
channel register map, 3-24 
channel synchronization, 9-41—9-46 
features, 9-2, 9-3 
functional description, 9-3 
general, 1-4, 2-29 
index register, 9-16 
interrupts, 9-40, 12-102 
example of use, 12-107 
link—pointer register, 9-19, 9-38 
example, 12-105 
memory mapped registers, 9-4 
operation examples, 12-101 
priorities, 9-22 
priority wheel, 9-24 
registers, 9-5, 9-7 
split mode example, 12-104 
START bits, 12-103 
SYNC MODE bits, 12-103 
synchronization of channels, 9-41—9-46 
throughput, 1-8 
transfer count register, 9-18 
transfer description, 9-5, 12-103 
TRANSFER MODE field, 9-28 
unified and split modes, 9-20 
DMA interrupt enable register (DIE), 2-8, 3-8 
double precision, fixed point, 12-41 


edge-triggered interrupts, 6-23 

electrical characteristics, 14-13 

electrical specifications, 14-12 

emulator (XDS510), 1-9 | 

event counters. See timers 

expansion register file, 2-9, 3-15 
interrupt vector table (IVT), 3-16, 6-26 

application, 12-19 

trap vector table (TVT), 3-15 
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extended precision number, floating-point 
format, 4-8 

extended precision registers, 2-6, 3-4, 12-41 
floating point format, 3-4 
integer format, 3-4 
saving (example), 12-13 

external buses (global, local), wait states, 
7-15 

external interrupts, 2-27 


fast Fourier transforms, 12-31, 12-63 
DIF (decimation in frequency), 12-64, 
12-65, 12-70 
DIT (decimation in time), 12-64, 12-78 
timing benchmarks, 12-88 
twiddle factors, 12-68 
features (of TMS320C40), 1-4 
FFT. See fast Fourier transforms 
filters 
adaptive, 12-58 
FIR, 12-51, 12-59 
IIR, 12-53, 12-54 
lattice, 12-88 
FIR filters, 12-51, 12-59 
FIX instruction, 4-28, 12-33 
FLOAT instruction, 4-30, 12-33 
floating point 
addition, 4-20 | 
conversion (to/from IEEE), 12-42 
conversion to integer, 4-28 
extended-precision format, 4-8 
format conversion, 4-9 
formats, 12-43 
IEEE, 12-44 
multiplication, 4-15 
normalization, 4-20, 4-24 
pop and push, 12-13 
reciprocal, 4-31 
register format, 3-4 
rounding value, 4-26 
short format, 4-6 
single-precision format, 4-7 
subtraction, 4-20 
underflow, 4-21 
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flush pipeline, 12-22 
formats 
conversion, floating-point, 4-9 
See also conversion of formats 
floating point, 12-43 
signed integer, 4-3 
unsigned integer, 4-4 
FRIEEE instruction, 12-43 


general addressing modes, 2-15, 5-19 
global control register (timer), 9-47 
global memory, 6-13, 6-17 
interface, 2-27, 13-20 
global memory interface. See memory inter- 
face (local, global) 
GOTOs, 12-27 


H1/H3 timing, 14-15 


[ACK instruction, 7-47 
IACK pin, 7-47 
timing, 14-30 
ICFULL flag 
description, 12-99 
enabling, 3-11 
ICRDY flag 
description, 12-99 
enabling, 3-11 
example of use, 12-106 
interrupt use, 3-9 
IEEE std. 754 (conversions), 4-11 
OF flag register (IIF), 2-8, 3-12, 12-103 
HOF pins 
boot loader use, 13-6 
loading, 13-14 
timing, 14-24, 14-29 
IIR filters, 12-54 
immediate addressing, 5-17 
index registers (IRO, IR1), 2-7, 3-5, 9-16 
indirect addressing, 5-5 | 


initialization of processor, 12-3 
example code, 12-4 
instruction cache, 3-25 | 
instruction register (IR), 2-26 
instruction set summary, 2-16—2-25, 
11-3—11-9 | 
functional groups, 11-3 
instructions, Chapter 11 
integer formats 
short integer, 4-3 
signed, 4-3 
single-precision integer, 4-3 
unsigned, 4-4 
interfaces, 13-3 
external, 13-3 
memory. See memory interfaces (local, 
global) 
parallel processing, 13-37 
shared bus, 13-43 
interlocked instructions, 2-27, 6-13, 7-39 
interlocked operations, 6-13 
internal bus, 2-26 
internal interrupt enable register (IIE), 2-8, 
3-10 
interrupt service routine, 12-14, 12-21 
interrupt vector table (IVT), 3-16 
application, 12-19 
boot loader use, 13-7 
interrupts, 2-27, 6-23 
answering, 12-28 
communication port, 12-100 
context switching, 12-15 
control bits, 6-24 
DMA, 6-25, 9-40, 12-102 
example, 12-107 
edge/level triggered, 6-23 
example, 12-28 
external, 2-27 
initiation condition, 6-11 
NMI, 6-23, 12-14 
prioritizing, 6-24 
processing, 6-27 
service routines, 12-14, 12-21 
trap comparison, 6-11 
vectors, 3-20, 6-25, 6-26 
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inverse of floating point, 12-36 
IR filters, 12-51, 12-53 
IVTP. See interrupt vector table (IVT) 


JTAG emulation timing, 14-38, B-3 
jumps, 6-9 7 | 
zero-overhead use, 12-11 


LAJ instruction, 6-9, 12-11, 12-22, 12-95 
LAJcond instruction, 6-9 
LAT instruction, 12-22 
LATcond instruction, 6-9, 6-11 
lattice filter, 12-88 
LB, LBU instructions, 12-30 
LDFI instruction, 6-13, 7-40 
timing, 14-20 
LDIl instruction, 6-13, 7-40 
timing, 14-20 
level—triggered interrupts, 6-23 
LH, LHU instructions, 12-30 
LMS algorithm, 12-58 
local memory interface, 2-27, 13-20 


See also memory interface (local, global) 


LOCK signal, 7-39 

logical operations, 12-28 
loops, 12-23—12-26 

LWL, LWR instructions, 12-30 


MB, MH instructions, 12-30 
mechanical data, 14-11 
memory, 2-10 
See also memory interface 
accesses 
fetches, 10-20 
loads, stores, 10-21 
pipeline, 10-20 
timing, 10-20 
cache, 2-10, 3-25 
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memory (Continued) 


communication ports memory map, 8-8 
general organization, 2-10 
global, 6-13, 6-17 
interfaces. See memory interface (local, 
global) 
memory interface control registers, 3-21 
memory maps, 2-12 
analysis module registers, 3-21 
communication ports, 3-23 
DMA, 9-4 
DMA coprocessors, 3-24 
memory interface control registers, 
3-21 
overall description, 2-13 
peripherals, 2-14 
timer registers, 3-22, 9-46 
organization, 2-10, 3-18 


pipeline conflicts, 10-11, 10-18 


RAM, 2-10 
zero wait states, 13-21 

ranges, 7-10 

registers. See memory interface control 
registers 

ROM, 2-10 
interface to ’'C40, 13-9 

ROMEN pin effect, 3-18 

sharing, 6-16 

timing, 7-17, 10-20, 14-18 


memory interface (local, global), 13-20 


bus arbitration, 13-48 
control registers. See memory interface 
control registers 
control signals, 7-3 
features, 7-1 
RAM (zero wait states), 13-21 
ready generation, 7-15, 7-17, 13-27 
shared bus, 13-43 
shared with bus arbitration, 13-38 
signals, 7-3 
strobes 
single, 13-21 
two banks, 13-25 
timing, 7-17 


wait states, 7-15, 13-27 
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memory interface control registers, 3-21, 7-6 
address ranges, 7-10, 7-11 
bit contents, 7-7 
boot loader use, 13-7 
example configuration, 13-46 
LSTRB ACTIVE field, 13-21 
page size, 7-9 
PAGESIZE field, 13-21, 13-32 
reset effect, 7-6 
STRBx SWW field, 7-16. 
STRBx SWW fields, 13-28 
timing, 7-17 
wait states, 7-15 
memory maps, 2-12—2-14, 3-18 
analysis module registers, 3-21 
communication ports, 3-23, 8-8 
DMA, 9-4 
DMA coprocessors, 3-24 
memory interface control registers, 3-21 
overall description, 2-13 
peripherals, 2-14 
timer registers, 3-22, 9-46 
MPYI3 instruction, 12-42 
MPYSHI3 instruction, 12-42 
- multiple processors, 6-13 
multiplication, matrix vector, 12-61 
multiplication, floating point, 4-15 
multiplier, 2-4 


nested block repeats, 6-6 

NMI, 6-23 

NORM instruction, 4-24 

normalization, 12-38 
floating point value, 4-20, 4-24 

OCEMPTY flag 
description, 12-99 
enabling, 3-11 

OCRDY flag 
description, 12-99 
enabling, 3-11 
interrupt use, 3-9 


optimization (assembler code), 12-95 
overflow, 4-21, 4-28 


P flag (cache), 3-25 
packing data example, 12-30 
page 
size, 7-9, 7-13 
switching, 13-32 
timing, 14-23 
parallel addressing modes, 2-15, 5-23 
parallel instruction set 
optimization use, 12-95 
summary, 2-23—2-25 
parallel processing 
'C40-to-’C40, 13-37 
general, 1-3 
_ shared bus, 13-43 
PAU (port arbitration unit), 8-12 
See also port aribtration unit 
operation, 8-12 
performance, 1-6 
period register (timer), 9-50 
See also timers 
peripheral bus, 2-28 
communication port, 2-29 
general architecture, 2-28 
map, 3-22 | 
pin (TMS320C40) 
descriptions, 14-7 
names, 14-2 
pin states at reset, 6-18 
pinouts, 14-2 
pipeline, 10-1 
conflicts 
avoiding, 12-96 
branching, 10-4 
memory, 10-11 
memory (resolving), 10-18 
registers, 10-8 
flush, 12-22 
memory accesses, 10-20 
_ Structure, 10-2 
POPF instruction, 12-13 
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port arbitration unit, (PAU) 8-5, 8-12 

synchronizer timing, 8-32 
postdisplacement examples, 5-11 
postindex examples, 5-15 
predisplacement examples, 5-9 
preindex examples, 5-13 
primary register file (CPU), 2-6, 3-3 
priority (memory) 

fixed, 13-70 

rotating, 13-58 
priority wheel (DMA), 9-24 
products, 320 family, 1-2 
program 

buses, 2-26 

control, 12-9 

flow, 6-1 | 
program counter (PC), 2-9, 2-26, 3-14 
programming methodology, tips, 12-94 
-PUSHF instruction, 12-13 


queues (stack), 5-33 


RAM, 2-10 
zero wait states, 13-21 
RCPF instruction, 4-31, 12-33, 12-36 
ready 
generation, 7-15, 13-27 
timing, 7-17 
reciprocal (RCPF inst.), 4-31 
reciprocal square root (RSQRF inst.), 4-33 
register buses, 2-26 
registers, 2-7 | 
auxiliary (ARO-AR7), 2-6, 3-5 
block repeat (RS, RE), 3-14, 12-26 
block size (BK), 2-7, 3-5 
counter (timer), 9-50 
data page pointer (DP), 2-7, 3-5, 5-4 
DMA interrupt enable (DIE), 2-8, 3-8 
bit descriptions, 3-9 
extended precision (RO—R11), 2-6, 3-4 
saving (example), 12-13 
global control (timer), 9-47 
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registers (Continued) 
IIOF flag register (lIF), 2-8, 3-12, 6-23, 
6-24 
index (IR1, IRO), 3-5 
internal interrupt enable (IIE), 2-8, 3-10, 
6-23 
bit descriptions, 3-11 
optimization use, 12-96 
period (timer), 9-50 
pipeline conflicts, 10-8 
program counter (PC), 2-9, 2-26, 3-14 
repeat count (RC), 2-8, 3-14, 6-2, 12-26 
repeat end address (RE), 3-14, 6-2 
repeat start address (RS), 3-14, 6-2 
reserved bits, 3-14 
saved in context switches, 12-16 
stack pointer (SP), 2-8, 3-5 
application, 12-13 
status register (ST), 2-8, 3-5, 11-11 
bit descriptions, 3-6 
repeat count register (RC), 2-8, 3-14, 6-2, 
12-26 
repeat end address register (RE), 3-14, 6-2 
repeat mode, RPTS initialization, 6-4 
repeat modes (block, single instruction), 6-2 
initialization, 6-2 
optimization use, 12-95 
repeat start address register (RS), 3-14, 6-2 
reset, 3-17, 6-18, 12-3 
communication ports, 8-14 
memory interface control registers, 7-6 
operations performed, 6-22 
pin states, 6-18 
signal generation, 13-75 
timing, 14-28 
vector mapping, 3-17, 12-3 
vectors, 6-25 
RESETLOCx pins, 3-17, 12-3, 13-9 
RETIcond instruction, 6-9, 6-12 
RETIcondD instruction, 6-9, 12-22 
RETScond instruction, 6-9 
return from subroutine, 6-9 
RND instruction, 4-26 
ROM, 2-10 | 
rounding of floating point value, 4-26 
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RPTB and RPTBD instructions, 6-3, 12-23, 
12-63 
optimization use, 12-95 
RPTS instruction, 6-4 
example, 12-23, 12-30 
optimization use, 12-95 
RSQBRF instruction, 4-33, 12-38 


segment start address (SSA) register, 3-25 
semaphores, 6-17 
shared bus interface, 13-43 
short floating-point format, 4-6 
SIGI instruction, 6-13, 7-44 
timing, 14-22 
signal descriptions, 14-7—14-10 
signal transition levels, 14-14 
signal-group control, 7-38 
simulator, 1-10 
_ software control, 6-1 
software development tools, 1-9 
ANSI C compiler, 1-9 
assembler/linker, 1-9 
compiler, 1-9 
general, 1-9 
linker, 1-9 
simulator (state-accurate), 1-10 
SPOX operating system, 1-9 
XDS510 emulator, 1-9, B-1 
split mode (DMA), 9-20, 12-104 
SPOX operating system, 1-9 
square root, 12-38 
stack, 5-31, 5-33 
dequeues, 5-33 
queues, 5-33 
stack pointer (SP), 2-8, 3-5 
application, 12-13 
state diagram, port arbitration unit, 8-13 
status register (ST), 2-8, 3-5, 11-11 
bit descriptions, 3-6 
STFI instruction, 6-13, 7-42 
timing, 14-21 


STIl instruction, 6-13, 7-42 
timing, 14-21 
strobe settings, 7-8 
strobes, 7-12 
timing, 7-17 
wait states, 13-21 
SUBB instruction, 12-41 
SUBC instruction, 12-33 
subroutines, 12-11, 12-15 
calls. See calls 
subtraction, floating point, 4-20 
system configurations, 13-4 


test load circuit, 14-13 


three-operand addressing modes, 2-15, 5-20 


throughput, 1-4, 1-6 
communication port, 1-7 
DMA, 1-8 | 

timer global control register, 9-47 
diagram, bit summary, 9-47 

timer registers, 3-22 

timers, 9-45—9-54 
applications, 12-97 
architecture, 2-29 
counter register, 9-45, 9-50 
global control register, 9-46, 9-47 
operation nodes, 9-51 
period control registers, 9-50 
period register, 9-45, 9-50 
timing, 14-37 

timing 

bus control, 14-19 
memory access, 7-17, 14-18. 
parameters, 14-15—14-26 
STRB, RDY, 7-17 

TLCKO,1 pins, 12-97 

TMS320 family, products 1-2 

TOIEEE instruction, 12-43 

trap vector table (TVT), 3-15 
boot loader use, 13-7—13-8 

TRAPcond instruction, 6-9 

traps, 6-9, 6-11 
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TSTB instruction, 12-28 
TTL levels, 14-14 | 
TVTP. See trap vector table (TVT) 
twiddle factor, 12-68, 12-78 
u—law 
compression, expansion, 12-46 
conversion, linear, 12-46 
underflow, 4-20 
unified mode (DMA), 9-20 
unpacking data example, 12-31 


vectors (reset, interrupts), 6-25 
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wait states, 7-15, 7-36, 7-37, 13-20, 13-27 
bus disabled, 7-38 | 
consecutive reads, then write, 13-23 
consecutive writes, then read, 13-25 
multiple waits circuitry, 13-31 
requirements, 13-20 

word manipulation, 12-30 
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XDS510 emulator, 1-9, B-1 
XDS510 emulator design considerations, B-1 
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